Writing Systems: Signs, Icons, Symbols and Abstraction
How does abstraction help to make writing a tractable problems for the human brain?
In my last post, I wrote about abstraction in computer engineering. In today’s post, I want to start laying the foundation for looking at abstraction in two other fields, in visual arts and in linguistics. To do that, we’ll start in a place where the two fields overlap: writing systems.
First, we need to define our terms. The study of writing systems falls within the field of semiotics, which has two intellectual fathers, Ferdinand de Saussure and Charles Sanders Peirce. In his Course on General Linguistics, Saussure articulated his concept of the “linguistic sign”:
For some people, a language, reduced to its essentials, is a nomenclature, a list of terms corresponding to a list of things... This conception is open to a number of objections... A linguistic sign is not a link between a thing and a name, but between a concept and a sound pattern. (trans. Roy Harris)
Several paragraphs later, in order to avoid the possibility of a “sign” being conflated with its constituent “sound pattern”, Saussure replaces “concept” and “sound pattern” with signifié”, “signified”, and signifiant, “signifier” (Roy Harris’s translation renders these as “signification” and “signal”, but I’ll go with “signified” and “signifier”, which is the more well-known rendition.)
Note that Saussure’s definition of a sign here is restricted to what he calls linguistic signs. He acknowledges that his approach can be used in semiotics, but his chief interest (at least in the Course) is in applying the concept of the sign to linguistics.
For Saussure, the linguistic sign has the particular property of being arbitrary. Saussure in fact asserts that “the link between signifier and signified is arbitrary,” and contrasts signs with symbols:
... it is characteristic of symbols that they are never entirely arbitrary. They are not empty configurations. They show at least a vestige of natural connection between the signifier and the signified. For instance, our symbols of justice, the scales, could hardly be replaced by a chariot.
Here we run into a problem of terminology. The other founding father of semiotics, Charles Sanders Peirce, used the term “symbol” in an entirely different way:
... There are three kinds of representations. 1st. Those whose relation to their objects is a mere community in some quality, and these representations may be termed Likenesses [the more common term is “Icons”]. 2nd. Those whose relation to their objects consists in a correspondence in fact, and these may be termed Indices or Signs. 3rd. Those the ground of whose relation to their objects is an imputed character, which are the same as general signs, and these may be termed Symbols. (from On a New List of Categories)
Maybe it’s because I’m more familiar with the substance of Saussure’s work than with Peirce’s, but I find Peirce’s language very opaque and hard to grasp. Peirce’s theory is expansive and detailed, but I’m not trying to lay out an entire theory of semiotics here. I just want to define the terms “sign”, “icon”, “index” and “symbol” for our purposes.
So, to make life easier for everyone...
|Signifier||According to Peirce||According to Saussure||What I'll Call It|
|🍎||an icon of an apple: the signifier shares a Likeness with its signified.||a symbol of an apple: the signifier has a natural relationship to its signified.||icon|
|🍴||an index of eating: the signifier does not depict eating itself, but instead depicts the instruments of eating, which point to the act of eating.||a symbol: the signifier has a natural relationship to its signified, even if one step removed.||symbol|
|⚖️||an index of justice: the signifier does not depict justice itself, but instead depicts a metaphorical instrument of justice, which points to justice.||a symbol of justice: the signifier has a natural relationship to its signified, even if metaphorical.||symbol|
|⚠️||a symbol of danger: the signifier has no natural relationship to its signified. If we think “danger” when we see the signifier, it is because we have imputed the signifier with the character of “danger”.||A sign of danger: the signifier has an arbitrary relationship to its signified. If we think “danger” when we see the signifier, it is because we have associated it with this signifier purely by convention.||symbol|
This is an analysis of Peirce through the lens of Saussure, which is perhaps unfair to Peirce. Peirce defined “signifier” more precisely than Saussure did, and included an “interpretant” in his model, which Saussure left out. We don’t have to go there. (Yet.)
What do these signifiers represent?
This charts the evolution of the Chinese writing system. Oracle bone script and bronzeware script were contemporaneous, as are large seal script and small seal script. Regular script is what modern Chinese writing looks like.
From left to right, the logograms are the words for "water", "tree" or "wood", "moon", and "mountain". In all of these cases, the logogram begins as an icon and evolves to become a symbol.
You might argue that 木 and 山 still have iconic qualities, and given our definitions of “iconic” and “symbolic”, that’s not an unreasonable argument. Icons and symbols, as we have defined them, exist on a spectrum, not as binary states.
Icons are primarily representational: their relationship with their signifieds is that they resemble their signifieds in some recognisable way.
Symbols, on the other hand, are primarily abstract: their relationship with their signifieds is not necessarily recognisable, and the meaning of symbols is one established through repeated, collective use.
Representational art and abstract art are often contrasted with each other, but our definitions of icons and symbols suggest that representational and abstract art, too, exist on two ends of the same scale:
(Note that in this model, the symbolic is abstract, but the abstract is not always symbolic — think about Jackson Pollock’s No. 5, for instance. Additionally, in this model, art that is rich in symbolism is not necessarily abstract. For all its symbolism, Dalí’s Persistence of Memory falls on the representational end of the spectrum; his tree looks like a real-world tree and his melting clocks look like real-world melting clocks.)
Remember the characterisation of programming languages as being either low-level and “close to the metal”, having a low level of abstraction, or high-level and having a high level of abstraction? A low-level language shows you — and makes you take care of — every little detail of what’s happening at the level of the CPU. A high-level language removes various aspects of what’s happening at the CPU level from your field of view — it abstracts them away, so that you can work more efficiently.
That’s what happens with icons and symbols, too. As icons are used more and more often, they morph. All the irrelevant details are removed, leaving only what is essential for communicating the intended meaning, and edge towards the symbolic, the abstract end of the spectrum.
In the case of the Chinese writing system, the movement away from the representational to the abstract was an entirely organic process. As each generation of calligraphers followed another, the calligraphic style of the Chinese writing system evolved and arrived at the current system (whether the “current system” is Traditional or Simplified is a different matter, and the evolution and differences between the two are worth exploring another time.)
Now the question is: as representational, iconic signifiers gradually become abstract, symbolic signifiers, what exactly is it that gets abstracted away?
A writing system differs from visual art in one key respect. If I draw a river, my drawing of a river is the signifier, and the thing it signifies is the idea of a river that looks reasonably similar. Looking at it, my river drawing will bring to your mind an image of what such a river might look like in real life.
Now, when it comes to writing systems, here’s what Saussure has to say:
A language and its written form constitute two separate systems of signs. The sole reason for the latter is to represent the former. (emphasis mine)
With a writing system, the signified is not an image of the thing that exists in the real world. The signified is the sound pattern of the word, the set of sounds that make up the spoken word. The sound pattern of the word, if you remember Saussure up top, is itself a signifier that refers to the concept that’s brought to mind when you hear the word. (You could say that a writing system is already one level of abstraction removed from drawing. Ba-dum-tss!)
If we look at the progression of the Chinese writing system, the signifiers start out with curved lines. The strokes can move in any direction. It is the shape of the signifier that matters; the relationship of individual strokes to one another is less important. Even then, there’s already a clear difference between the oracle bone script and the bronzeware script. The writing medium probably has something to do with this: the bronzeware inscriptions were made on wet clay molds before the bronze was cast, allowing for a greater level of detail and the use of more curved lines. On the other hand, oracle bone script tends to favour straight lines and simplified logograms (my preferred term for Chinese characters). The process of abstraction is already visible, even at this early stage.
A writing system needs to have certain properties. It needs to be easy to reproduce, and easy to parse visually. The Chinese writing system has thousands of logograms, all of which have to be distinct from one another. The vast majority of them are not iconic or ideographic (they’re modified rebuses — another type of abstraction that we’ll discuss in a moment), but they still need to be visually identifiable at a glance. If you tried to create a writing system that oriented itself towards creating hundreds, if not thousands, of iconic shapes, readers and writers would spend an inordinate amount of time in the nitty gritty of ink and paper, trying to distinguish one shape from another.
Well, once the logograms are widely known and recognised as corresponding to a spoken sound, the link between the iconic signifier and its eventual signified can be broken. No longer does a logogram have to recall its real-world referent: readers and writers of the language only need to associate the logogram with the corresponding sound in the spoken language. This gives the writing system room to evolve in a more abstract direction. Logograms need not be iconic. This turns out to have a major effect on the Chinese writing system, as we will see in a second.
We can see that as the writing system evolves, the lines straighten and become rectilinear. That makes sense: straight lines are easier to reproduce consistently than curved lines. Moreover, it’s not the shape of the logogram that matters now, it’s the strokes and their relationship to one another. That makes it possible to create thousands of logograms that are easy to distinguish from one another.
Effectively, the rectilinear scripts abstracted shapes and curves into lines, angles and hooks. This prefigures the kind of abstraction we later see in abstract art.
Rebuses and Modified Rebuses
Consider the numbers 1 to 4 in Chinese:
It’s easy to see how the logograms 一，二，三 came about: they’re visual ideographic representations of the concept of 1, 2 and 3. What about 四?
It turns out that 四 is a rebus. Here’s the historical evolution of the written form of “four” in Chinese:
The reconstructed Old Chinese pronunciation of 四 is
*hljids (if you’re curious about what that sounds like, as I was, you can listen to the Old Chinese numbers here. The Chinese languages have a large number of homophones and near-homophones, different words that share the same pronunciation or are similar-sounding, and this proved to be a key factor in the development of their writing system. As far as we know,
*hljids is also how the Old Chinese word for “nostrils” was pronounced.
四 was originally a logogram, probably iconic, for the homophone
*hljids, meaning “nostrils”. Oracle bone script and bronzeware script were contemporaneous, and we can see that the Chinese used four horizontal lines 亖 in the style of 一，二，三 when writing on the hard oracle bone, but opted for the homophone 四 when writing on the more forgiving soft clay. Presumably, the difficulty of distinguishing 三 from 亖 at a glance led writers to favour the use of the logogram 四 for 4 instead wherever possible, and eventually 四 became the standard form while 亖 fell out of use.
The appearance of the rebus is significant. Just as a high-level programming language abstracts away entire layers of nitty-gritty computational data that slows humans down, rebuses in the Chinese writing system abstract away the need to create an iconic or ideographic logogram to represent each concept. We can think of this in terms of layers of abstraction, too:
The sound pattern layer is an abstraction sitting on top of the concept layer, and the logogram layer sits on top of the sound pattern layer. The presence of the sound pattern layer between the logogram layer and the concept layer is what allows the logograms to be divorced from the concepts they ultimately signify. It allows all the signs in the chain to be purely arbitrary.
How would a reader differentiate 四 the number and 四 the body part, then? At first, there was no visual distinction made, and readers simply relied on context. This introduces a different difficulty — ambiguity — but that is mitigated by the fact that “four” is a far more common word in most languages than “nostrils” is.
Over time, two things happened. One was that the word
*hljids “nostrils” underwent semantic change and came to mean “mucus”. The other was that the association between 四 and “four” became so strong that when it was necessary to write “mucus”, writers started to disambiguate the logogram by adding 水, “water”, (氵in clerical and regular Chinese script) to the left of 四 to indicate the intended meaning of “mucus”. This created the logogram 泗, a modified rebus: the rebus signifies the sound pattern, and the modification (usually called the radical) indicates which of many possible concepts is intended. (Note that because of the rebus component, the resulting modified rebus is still an arbitrary sign.)
The average educated Mandarin speaker knows about 8,000 logograms, and the overwhelming majority of them are modified rebuses like the above. Interestingly, modern word processors have obviated the need to remember how to write all of them. People typing in Chinese type a romanised form of what they want to say, and a choice of logograms pops up; they only need to know how to recognise the logograms they want to use. Without regular handwriting, a phenomenon known as character amnesia occasionally surfaces, where the writer forgets how to write the logogram they meant to write. That’s not surprising, since the modern computer-based workflow effectively creates an alternative written layer, based on Mandarin’s relatively simple and constrained phonology, that competes with the expansive logographic system:
Writing is not a natural linguistic facility for humans. Children who grow up around language will learn to listen and speak, or to sign and understand sign language, but reading and writing have to be expressly taught. Somehow, the human brain can maintain a lexicon of tens of thousands of words in the form of sound patterns, but it cannot maintain a library of tens of thousands of separate written icons or symbols to represent those sound patterns. It has to reduce that written inventory to a few thousand at most, and even then vanishingly few writing systems have that many (remember, English has just 26).
Abstraction, which allows us to remove entire dimensions of temporarily irrelevant information, is what helps us do it.