Two levels of semantics


"What does it mean?" Everybody asks this question time by time. Let's ask now, "What is meaning?" Human language is no more than a symbolic system used to denote reality, but when we read a text, our brain doesn't create any reality. It creates yet another representation instead. This representation comes in the form of internal images. So we have three layers: the reality itself, its representation in the nervous system, and its textual description. Various tricks of understanding are connected to the transformations from one layer to another.


Not very much is known about fine details of this internal representation, but some of them may be reflected in syntax. Language is a late acquisition of evolution and understanding depends upon the quality of nonverbal perception. Most distinct details come from anatomy of the brain. Images of human mind are located in the neocortex. This is a two-dimensional structure. Its thickness is just a few millimetres while the area is many square centimetres. The neocortex is divided into several functionally specialized fields. They represent different sensory modalities as well as abstract levels. Language encodes images of all types. The same words may be used for audio and visual perception. Primary fields, where the signal enters the neocortex, are best studied. It was established that the external, real world images still retain their shape here. Further they may undergo substantial modifications. For example, the Fourier transformation produces a completely different image. The primary visual cortex is the place where two worlds meet each other. If we understand how language describes two-dimensional pictures, we will be able to apply the same principles to the whole of the internal world.

Now let's proceed to objective semantics. How does language describe the real world? To determine this, we should outline some correspondence between language features and real-world features, but how will we name the latter? A language is needed again. The problem loops on itself. Let's leave this issue to philosophers and resort to the axiomatic approach. Language reflects our perception. The features of language are what we extract from the world. Maybe something else exists, but we don't know about it. So what can be found? The dictionary of language falls into several parts of speech. Nouns represent items, verbs - actions. These are static versus dynamic. Adjectives are attributes of nouns while adverbs are attributes of verbs.

The dictionary contains the major part of language - names of natural phenomena, but language is not the dictionary only. Syntax is the set of rules by which words may compose sentences. As a result, the descriptive power increases substantially. This is pure combinatorics. Suppose you have a sentence of 3 words with just 100 words in the dictionary. The total number of all-to-all combinations will be 1000000. On the other hand, that's where great science may be hidden. Images also may be composed on the principles "from the general to the particular" and "from parts to the whole". If an item is represented by a single noun, it may be drawn in black-and-white. As soon as you attach a colour adjective, this corresponds to execution of the command 'switch all pixels of <item> to <colour>'.


The formats of the internal and language representations are quite different. How do they fit each other? Like in the digital electronics, the format of images is digital. Both the retina of the eye and the gray matter of the brain consist of neurons. Meanwhile images themselves are analog. Even black-and-white pictures have gradations of the gray with the smooth transition from light to shadow. On the contrast, language is clearly discrete so conversion is needed. Similar conversion is widely used in electronics, but it is analog-to-digital conversion. Analog-to-discrete is different. What methods may be imagined?

Natural language has numerals so parametrical methods are possible. Suppose you have the name "ellipse" with the numeric parameter of eccentricity. You can gradually change the shape from the straight line to the circle. Likewise, discrete parameters may define subclasses. Imagine a concrete one-storey building. Now let it be a log cabin. Now - a skyscraper. These are different variants of the house. Digitalization is also used, only with small number of gradations. For example, visible light has the continuous spectrum. Any wavelength is possible from the range, but language divides it into several discrete colours. Frequency methods are possible as well. Suppose you have a text with names of two persons scattered throughout it. Whom is this text about? Calculate the numbers of occurrences and compare them. If you change the frequency of each, the answer may also change.


Images were eventually implemented in programming languages through Object-Oriented Programming. Similar approach is clearly present in natural language. C++ implements objects, but their prototypes were already present in C as a 'struct' data type. 'struct' may be used to represent results of parsing so a sentence of natural language already represents an image. More thorough description contains several sentences. Like in programming languages, natural language groups sentences into paragraphs, chapters, books. Objects may be embedded within each other.


How may meaning be represented in the computer? Of course, the best variant would be to implement the internal images immediately, but for high-quality representations it is rather resource-consuming. In addition, we don't know details. There is another solution. As we have already seen, language describes not reality immediately, but the result of its perception, that is, the images which we discuss here. Meaning of a text is yet another text. Then what's the difference? The trick is that perception is hierarchical with the next stages being more abstract. The second text will represent an image on the next level of the cognitive system. The most advantage is that such a solution will retain all the necessary details - that which we already know and the unknown yet.

What transformations are possible? The simplest is normalization. In order that several users could work with the same knowledge base, all synonyms should be replaced by a single word. It will be the main name of the cluster. Another method is inference. Suppose the fact is: "I put book on shelf." The meaning will be: "Book stands on shelf."


Copyright (c) I. Volkov, January 27, 2014