*Mathematics of image recognition*

The task may be formulated as follows. There is black
background and a white figure on it. The figure is empty. We will consider the
most common 2D geometric shapes: a circle, ellipse, square, rectangle, parallelogram,
triangle. The figure may be of any size in any part of the image and may be
turned by any angle. The task for computer is to determine automatically the
type of this figure and its parameters.

The solution is based on neural nets. More specifically
- on a simple perceptron (3- or even 2-layered is enough). The net learns
associations between these shapes and their names. Since the figures are
hollow, they almost don't have common dots so even the simplest 2-layered net
can distinguish between them. Neural nets can remember any shape, but it must
be absolutely standard.

For this purpose we will implement the step of
preprocessing. The program must determine the minimal rectangle which encloses
our figure, then scale it so that the size of the larger side is 100 pixels -
exactly what you see above. Already after this step we discover that different
figures have different properties. 3 categories may be determined. The circle
is the simplest (category 0). After that preprocessing, the figure will always
be invariant. We will not consider the thickness of the line now. The square is
already more difficult. It may be turned. This is the category 1.

For such shapes, we need yet another step of
preprocessing - rotation. Only then they may be scaled to the standard
dimension. The category 2 will have 1 parameter. For the rectangle this is the
ratio of its sides. So we need 2 additional steps of preprocessing - rotation,
then disproportional scaling. This will turn it into the standard square. Finally,
the category 3 will have more than 1 parameter. Triangles and parallelograms
fall into this group.

For the latter, we will need the following pipeline of
transformations:

Normalization becomes too complicated. Probably some
other methods without neural nets are more adequate. For example, we can
formulate a definition of such a shape which makes it possible to create a
completely analytical method of recognition. Say, the parallelogram is a
polygon with 4 line segments and 2 pairs of parallel sides. Such definitions
are easy to check if the hypothesis about the shape type is formulated, but
with the neural approach we can just present a normalized input and get the
output in one step without checking many hypotheses.

This classification of possible variants hints an
efficient algorithm of elementary shape detection. We do the first step and
check whether the result is a circle. If it is not, we proceed further.

Copyright (c)
I. Volkov, January 02, 2019