# Extended Kohonen Maps

A Kohonen map is a Self-Organizing Map (SOM) used to order a set of high-dimensional vectors. It can be used to clarify relations in a complex set of data, by revealing some inherent order. This webpage gives access to software that can be used to create standard Kohonen maps, as well as some extensions.

Update 2001/11/16: Recompiled koh.exe for Windows allows for processing of much larger data files.

Contents:

Note: Images on this webpage use grey-scales to convey information. If you don't see 16 shades of grey in the image to the left some information will be lost.

## Literature

The primary source on Kohonen maps is:
Teuvo Kohonen.
Self-Organization and Associative Memory.
Springer-Verlag, Berlin, 3rd edition, 1989.
The extensions were first described in:
Peter Kleiweg.
Neurale netwerken: Een inleidende cursus met practica voor de studie Alfa-Informatica.
Syllabus, section 5.7: Analyse van complexe data.
Master's thesis, Rijksuniversiteit Groningen, 1996.
With software.

## Kohonen's algorithm

A Kohonen map is created using Artificial Neural Network techniques. A set of vectors is input repeatedly to a map consisting of units. Associated with each unit is a weight vector, initially consisting of random values. Units respond more or less to the input vector according to the correlation between the input vector and the unit's weight vector. The unit with the highest response to the input is allowed to learn, as well as some units in the neighborhood. The neighborhood decreases in size during the training period. Learning is done by adjusting the weights of the units by a small amount to resemble the input vector more.

The result of the training is that a pattern of organization emerges in the map. Different units learn to respond to different vectors in the input set, and units closer together will tend to respond to input vectors that resemble each other.

When the training is finished, the set of input vectors is applied to the map once more, marking for each input vector the unit that responds the strongest (is most similar) to that input vector.

To demonstrate this algorithm, Kohonen used the set of 32 vectors reproduced in the table below. The vectors have dimension 5, and are labeled A-Z,1-6.

 A B C D E F G H I J K L M N O P Q R S T U V W X Y Z 1 2 3 4 5 6 10000 20000 30000 40000 50000 31000 32000 33000 34000 35000 33100 33200 33300 33400 33500 33600 33700 33800 33310 33320 33330 33340 33610 33620 33630 33640 33621 33622 33623 33624 33625 33626

There is order in this set of vectors that can be displayed in a minimal spanning tree, obtained by linking all the vectors together, using the smallest possible square difference between linked vectors:

Using Kohonen's algorithm on this set of vectors generates the map below:

## Extensions

A traditional Kohonen map groups similar input vectors together. It does not show any discontinuities. Distance implies difference, but nearness does not imply resemblance. Compare the pair of units marked G and H with the pair marked A and 5, on the map above.

However, this information is available. All you have to do is make it visible. This can be done by calculating the square difference between neighboring units on the trained map, and use this value to color the edged separating the units. This is done in the map below, where dark lines indicate strong difference, and light lines indicate strong resemblance.

Additional information can be added. You can create a minimal spanning tree, and overlay this on the map, as is done on the map below.

Kohonen's example set of vectors is highly artificial. As a more natural example, I created a set of vectors from statistical data about occurrences of combinations of characters in Dutch words. The result is a set of 24 vectors of dimension 58. The characters q, x and y are omitted for their rarity in Dutch. The combination ij is considered to be one character, a vowel.

Applying nearest neighbor vector clustering, using my clustering software results in an image of a dendrogram, as displayed below. Note that creating a dendrogram for Kohonen's example set of vectors is meaningless because all pairs of nearest neighbors have an identical square difference. By the way, the algorithm to obtain nearest neighbor vector clustering is identical to the algorithm used to obtain the minimal spanning trees.

The result of creating a Kohonen map, with all the bells and whistles, from this set of vectors is shown below. One more additional type of information is made visible here. The lines of the minimal spanning tree indicate the difference between linked vectors. Closed lines indicate strong resemblance (e.g. l, r), open lines indicate strong difference (e.g. h, j).

Here is a large example. A spanning tree is omitted.

## Software

The following software is available:
koh.c
Program to create Kohonen maps from a set of vectors.
koh.exe
Compiled version of koh for Windows. Runs in DOS-window under Windows 95 and up.
koh-old.exe
Outdated, compiled version of koh for MS-DOS.
kohview.cpp
Program to view a Kohonen map created with the koh program on a VGA display. MS-DOS only. Compile with Borland C 3.1, link with egavga.obj.
kohview.exe
Compiled version of kohview.
test.vec
Set of vectors used by Kohonen to illustrate his algorithm.
NLchars.vec
Set of vectors with statistical data about Dutch characters.
many.vec
Set of 998 vectors, used to generated the large example.
Running the koh program results in a lot of files. One is a PostScript file that can be used independent of all the other files. Most, but not all, of the other files are used by the kohview program to display the Kohonen map on screen, if there's no PostScript printer/interpreter available.

The usage of koh is fairly simple. Just run the program without any parameters for a list of possible options. For the syntax of the input file, refer to one of the examples.

You may want to modify the PostScript file produced by the program. All relevant options are at the top of the file. DX and DY define width and height of one unit. The flags SHOW_GRAY and SHOW_LINKS define whether or not to use the extensions. ANGLE controls the angle of curved links. If you plan to include the PostScript image into another document, remember to adjust the values of the BoundingBox, if you have changed the size of the image.

Sometimes it may be necessary to adjust the positioning of the labels on the map as produces by the program. For instance, two labels may be unreadable because they are mapped to the same unit. The commands to draw the labels are at the end of the PostScript file. When two short labels are mapped to the same unit you could delete one line in the PostScript file, and combine the labels on the other line. For instance, the labels (A) and (B) could become (A, B). Longer labels could be placed above each other. The command for drawing the labels uses three arguments, X and Y coordinate and the label itself. The coordinates are given as integer values, but they need not be. When two labels overlap, you can add .15 to the Y coordinate (second argument) of one label, and subtract .15 from the Y coordinate of the other label.

The usage of kohview is as simple as possible. Just run the program with as an argument the name of one of the files produces by koh, but without the extension.

What was stated above about the positioning of labels in the PostScript file applies to the output of kohview as well. In this case you need to make similar adjustments to the file with the extension .top as produced by the koh program.