12 PATTERN RECOGNITION INFORMATICA 4/91 USING KOHONEN MAPS AND MULTI-LAYER PERCEPTRONS Keywords: multi-layer perceptrons, initialization, a Kohonen map, clustering Tjasa Mesko Factuly of Electrical Engeneering and Computer Science University of Ljubljana Multi-layer perceptrons (MLPs) are now widely used for pattern recognition tasks such as speech recognition, handwritten character recognition, face recognition, etc. They were proven to generalize well to unseen data. Another kind of neural networks, namely, self-organizing feature maps have also been applied occasionally in pattern classification, but they were not that successful. In this study it is investigated how self-organizing feature maps could be useful in combination with MLPs as a tool for initializing the weights of a MLP. The purpose of the rcscarcli was to reduce the amount of supervised training which is required to train MLPs. PREPOZNAVANJE VZORCEV S KOHONENOVO MAPO IN Z VEČNIVOJSKIMI PEllCEPTRONI. Večnivojski perceptroni se v zadnjem času pogosto uporabljajo pri prepoznavanju vzorcev (na primer govora), prepoznavanju pisav, prepoznavanju obrazov in podobno. Druga vrsta nevronskih mrež, Kohonenove mape, so bile tudi občasno uporabljene pri reševanju podobnih nalog, vendar rezultati so bili slabši. V tem članku je obravnavana možnost inicializacije uteži skritega nivoja trinivsgskega perceptrona. Namen te raziskave je skrajšati čas učenja perceptrona. 1 Kohonen Self-Organizing Feature Maps The self-organizing map belongs to the category of neural networks that use unsupervised training. This means that each time a new input is presented to the map, the desired output is unspecified. This type of neural networks is used to perform data compression, sucli as vector quantization and as it will be explained later, to reduce the amount of supervised training. A vector quantizer is a mapping, q, that assigns to each input vector x — (xi,X21 • • • >itjv)i a codebook *A young researcher, employed in PAREX, d.o.o. vector d - (ca, ci2, • • •, ciN) Ci = q(x) drawn from a finite set of codebook vectors Q - {ci,c2l... ,cM} where M is the number of codebook vectors. The quantizer q is completely described by the set of codebook vectors and it divides the input vector space into clusters Ci = {z:q(z) = ci} (1) of input vectors which are mapped onto the iih codebook vector. The distortion caused by reproducing an input vector x by a codebook vector c; is given by 13 Figure 1: Locations of map vectors in a square lattice. Vector m (i-l)J + j can also be addressed as cv, where v — d(x, c,), where d is assumed to be a Euclidean distance which is defined by equation 2. d(x, Ci) = N -c»")2 (2) n = l Kohonen's algorithm creates a vector quantizer by adjusting vectors which are typicaly arranged in a two-dimensional grid (usually a square or a hexagonal lattice). In this study square maps will be used (see figure 1) and the map vectors will be represented by their map coordinates (i,j). The vector at position (i,j) in the map will be addressed in two ways, • rnij, (i,j) = (1,1),..., (I, J) • Cy 1,2,...,7 xJ where rriij equals and I, J are the map sizes. The vector quantizer function g(x) corresponding with a Kohonen map selects the codebook vector cv which is closest to x: d(x, c„) = min d(x, Ck),k = 1,2,.. k ,1-xJ In order to create the best codebook vectors, an iterative training is performed. During each iteration, a neighbourhood is defined around each vector of the map, as shown in figure 2. The neighbourhood NE(t) slowly decreases with time. At the beginning, the map vector components are initialized to small random values. Then, the following iterative procedure is applied whenever a new input vector x(t) is presented (t is the iteration number). 1. Compute the Euclidean distances d(x(t),cv(t)), to all nodes c„(i) according to the equation 2. 2. Select the node producing the minimum distance as the winning node c„. Figure 2: Topological neighbourhood at different times as the feature map is formed. NEv(t) is the set of nodes considered to be in the neighbourhood of a node cv at time t. The neighbourhood is decreased in size as time increases. In this example, 0 < t\