Mathematics of image recognition


The task may be formulated as follows. There is black background and a white figure on it. The figure is empty. We will consider the most common 2D geometric shapes: a circle, ellipse, square, rectangle, parallelogram, triangle. The figure may be of any size in any part of the image and may be turned by any angle. The task for computer is to determine automatically the type of this figure and its parameters.

The solution is based on neural nets. More specifically - on a simple perceptron (3- or even 2-layered is enough). The net learns associations between these shapes and their names. Since the figures are hollow, they almost don't have common dots so even the simplest 2-layered net can distinguish between them. Neural nets can remember any shape, but it must be absolutely standard.


For this purpose we will implement the step of preprocessing. The program must determine the minimal rectangle which encloses our figure, then scale it so that the size of the larger side is 100 pixels - exactly what you see above. Already after this step we discover that different figures have different properties. 3 categories may be determined. The circle is the simplest (category 0). After that preprocessing, the figure will always be invariant. We will not consider the thickness of the line now. The square is already more difficult. It may be turned. This is the category 1.


For such shapes, we need yet another step of preprocessing - rotation. Only then they may be scaled to the standard dimension. The category 2 will have 1 parameter. For the rectangle this is the ratio of its sides. So we need 2 additional steps of preprocessing - rotation, then disproportional scaling. This will turn it into the standard square. Finally, the category 3 will have more than 1 parameter. Triangles and parallelograms fall into this group.


For the latter, we will need the following pipeline of transformations:


Normalization becomes too complicated. Probably some other methods without neural nets are more adequate. For example, we can formulate a definition of such a shape which makes it possible to create a completely analytical method of recognition. Say, the parallelogram is a polygon with 4 line segments and 2 pairs of parallel sides. Such definitions are easy to check if the hypothesis about the shape type is formulated, but with the neural approach we can just present a normalized input and get the output in one step without checking many hypotheses.

This classification of possible variants hints an efficient algorithm of elementary shape detection. We do the first step and check whether the result is a circle. If it is not, we proceed further.



Copyright (c) I. Volkov, January 02, 2019