OSP got invited to Seoul to take part in the Typojanchi exhibition and organise a workshop at the international typography biennale in october of 2013. I was lucky enough to be part of the group that went to Korea.
Some OSP members met the curator for the exhibition in Chaumont last year, Na Kim, who was the one that got OSP to Seoul. The main focus for the exhibition was the visual identity work done by the group for the Théâtre de la Balsamine in Schaerbeek Brussels, which showcased many kinds of typographical experiments along side shapeshifting layouts —and methods of.
Typography being at the center of all of this, we were rapidly seduced by how beautiful the korean alphabet and language is. Hangul is a unique script that has physical basis for the shape of it's glyphs —shape of the tongue to produce corresponding sound is the inspiration for the stroke of the vowel letterform for example– and a great agglutinative method of what we latin script users would almost call ligatures.
With this as inspiration, we decided to continue a path that OSP had been making through the optical character recognition process, with various instances, and base the workshop on the fairly recently opened type recognising software Tesseract.
Here is the description of the workshop we sent to Typojanchi:
Fancy reading machines androids from science-fiction fantasies are embodied in our modern lower-profile real world as OCR software package. One of them, the free and open source Tesseract –1– is composed of two parts that we can study, thanks to it's licence.
There is the engine itself, and the training data for a language –2– partly based on what Tesseract called 'prototypes'. We could compare this 'before to type' (proto-type) to the culture a lecturer progressively gathers from his first lesson going from a novice to a fully grown expert.
By following the limit between the blank surfaces and the dark pixels of the shapes of letters, Tesseract compares its journey with other and previous ones, on images already followed in the past. It starts by learning patterns and specificities of languages, rythms and irregularities. It goes on to recognise the body of a glyph, then it works out, bit by bit, if this glyph is a letter, form is a word, and eventually it makes out phrases. Like all of us, Tesseract learns typography in this same process, in a completely intertwined way, as sentences, script and eventually, language. –3–
Tesseract follows rules by which it can make decisions. In a basic example from latin script, if the software seems to be recognising something ressembling to iii (three times the letter 'i'), specific rules kick in to suggest that it is most lightly the letter 'm' and not a triple consonant. Grammar and language coming in at a later stage, as it did for us, still following this unusual idea of teaching software to read. –4– The very specificities of typography and how each shape is drawn and could or couldn't be deciphered from another one arrives just after, as in the previous example the potential small parts that protudes from the i could form the arc of the m in a more convincing way if the font is a serif one than a sans serif one.
This process does become intertwined with the actual context: with time, the system becomes familiar, and extremely efficient with some specificites of a typeface. It's shape, it's overall form and size now mean something. It would have to relearn an entirely new toolkit to be able to read a different typeface. With this, could the relations binding shapes to their meanings be noticed?
At young, naive and early stages of deciphering writing systems, slowly working out the building blocks to a legible language, we wonder how synthetic constructions (like Hangul) compare to agglutinated ones (like Latin). More specifically, how do these methods influence OCR data?
On a more contemporary note, it would be hard to deny how much screens and screen text technologies have influenced typography these days. All languages carry different meanings, different cultures with their characters. These gri(d)tty displays are no favor to typographic heritage, but they have brought on so interesting conundrums. The rendering engine ttf autohint, by example, voluntary distords vector shapes of glyphs to optimize screen rendering –5–. When the movement to follow the grid become dsplacement to fit, the boundaries between canvas based, stable and territorial, and flux based, flexible and moving, blurs itself.
In this workshop, we propose to carefully replay some of the processes the OCR system uses to reread typography from the departure point of any new learner, the one we all have known at first and mostly definitively forgotten by now... By patiently observing the various parameters at play when a letter is to be differentiated from another, the thin and variable line of separation between signification and shape, between letter and typography begins to reveal itself. Could the different parts of the letters that compose barebones of other letters that are recreated in a kind of wild reverse engineered Metafont –6– paradigm, where all of the shapes of the glyphs are defined with geometrical equations?
We wonder how much we can learn from methods borrowed off OCR. By replaying its methods, but basing ourselves on some parameters only, not aiming for full comprehension, but basic knowledge of how our different sets of characters work retracing its first steps only? Would the outcome of this be enough to go on to understanding typographic subtilities, enabling a bridge between specificities in shape and specificities in language?
Finally, if we know organisation in Hangul and Latin are different, and that they do work along with similar ideas, could we try to avoid the main caveats of forcing comparaisons between each? Instead can we focus on the systems that the OCR-by-human must use to read both for rethinking deeper specificities between the two composition methods, between these two typography, between these languages?
- Here is a link to the git repository we used for the project
- And a link to the pdf of the little booklet we printed on the last day