Two levels of semantics
"What does it mean?" Everybody asks this
question time by time. Let's ask now, "What is meaning?" Human
language is no more than a symbolic system used to denote reality, but when we
read a text, our brain doesn't create any reality. It creates yet another
representation instead. This representation comes in the form of internal
images. So we have three layers: the reality itself, its representation in the
nervous system, and its textual description. Various tricks of understanding
are connected to the transformations from one layer to another.
Not very much is known about fine details of this
internal representation, but some of them may be reflected in syntax. Language
is a late acquisition of evolution and understanding depends upon the quality
of nonverbal perception. Most distinct details come from anatomy of the brain.
Images of human mind are located in the neocortex. This is a two-dimensional structure.
Its thickness is just a few millimetres while the area is many square
centimetres. The neocortex is divided into several functionally specialized
fields. They represent different sensory modalities as well as abstract levels.
Language encodes images of all types. The same words may be used for audio and
visual perception. Primary fields, where the signal enters the neocortex, are
best studied. It was established that the external, real world images still
retain their shape here. Further they may undergo substantial modifications.
For example, the Fourier transformation produces a completely different image.
The primary visual cortex is the place where two worlds meet each other. If we
understand how language describes two-dimensional pictures, we will be able to
apply the same principles to the whole of the internal world.
Now let's proceed to objective semantics. How does
language describe the real world? To determine this, we should outline some
correspondence between language features and real-world features, but how will
we name the latter? A language is needed again. The problem loops on itself.
Let's leave this issue to philosophers and resort to the axiomatic approach.
Language reflects our perception. The features of language are what we extract
from the world. Maybe something else exists, but we don't know about it. So
what can be found? The dictionary of language falls into several parts of
speech. Nouns represent items, verbs - actions. These are static versus
dynamic. Adjectives are attributes of nouns while adverbs are attributes of
verbs.
The dictionary contains the major part of language -
names of natural phenomena, but language is not the dictionary only. Syntax is
the set of rules by which words may compose sentences. As a result, the
descriptive power increases substantially. This is pure combinatorics. Suppose
you have a sentence of 3 words with just 100 words in the dictionary. The total
number of all-to-all combinations will be 1000000. On the other hand, that's
where great science may be hidden. Images also may be composed on the
principles "from the general to the particular" and "from parts
to the whole". If an item is represented by a single noun, it may be drawn
in black-and-white. As soon as you attach a colour adjective, this corresponds
to execution of the command 'switch all pixels of <item> to
<colour>'.
The formats of the internal and language
representations are quite different. How do they fit each other? Like in the
digital electronics, the format of images is digital. Both the retina of the
eye and the gray matter of the brain consist of neurons. Meanwhile images
themselves are analog. Even black-and-white pictures have gradations of the
gray with the smooth transition from light to shadow. On the contrast, language
is clearly discrete so conversion is needed. Similar conversion is widely used
in electronics, but it is analog-to-digital conversion. Analog-to-discrete is
different. What methods may be imagined?
Natural language has numerals so parametrical methods
are possible. Suppose you have the name "ellipse" with the numeric
parameter of eccentricity. You can gradually change the shape from the straight
line to the circle. Likewise, discrete parameters may define subclasses.
Imagine a concrete one-storey building. Now let it be a log cabin. Now - a
skyscraper. These are different variants of the house. Digitalization is also
used, only with small number of gradations. For example, visible light has the
continuous spectrum. Any wavelength is possible from the range, but language
divides it into several discrete colours. Frequency methods are possible as
well. Suppose you have a text with names of two persons scattered throughout
it. Whom is this text about? Calculate the numbers of occurrences and compare
them. If you change the frequency of each, the answer may also change.
Images were eventually implemented in programming
languages through Object-Oriented Programming. Similar approach is clearly
present in natural language. C++ implements objects, but their prototypes were
already present in C as a 'struct' data type. 'struct' may be used to represent
results of parsing so a sentence of natural language already represents an
image. More thorough description contains several sentences. Like in
programming languages, natural language groups sentences into paragraphs,
chapters, books. Objects may be embedded within each other.
How may meaning be represented in the computer? Of
course, the best variant would be to implement the internal images immediately,
but for high-quality representations it is rather resource-consuming. In
addition, we don't know details. There is another solution. As we have already
seen, language describes not reality immediately, but the result of its
perception, that is, the images which we discuss here. Meaning of a text is yet
another text. Then what's the difference? The trick is that perception is
hierarchical with the next stages being more abstract. The second text will
represent an image on the next level of the cognitive system. The most advantage
is that such a solution will retain all the necessary details - that which we
already know and the unknown yet.
What transformations are possible? The simplest is
normalization. In order that several users could work with the same knowledge
base, all synonyms should be replaced by a single word. It will be the main
name of the cluster. Another method is inference. Suppose the fact is: "I
put book on shelf." The meaning will be: "Book stands on shelf."
Copyright (c)
I. Volkov, January 27, 2014