2017-04-15

Introduction

SOM is based on competitive learning where output neurons compete with each other to be activated. (Kohonen, 1982)

• Output neuron that activates is called winner-takes-all neuron
• Lateral inhibition is one way to inplement competition for map formation
• In SOM, neurons are placed on a lattice, on which a feature map is created, a meaningful coordinate system for different features.
• The lattice thus forms a topographic map where the spatial location on the lattice is indicative of the input features.

Two models

Willshaw-von der Malsburg model:

• input neurons arranged in 2D lattice
• output in 2D lattice
• Lateral excitation/inhibition (Mexican hat) gives rise to soft competition
• Normalized Hebbian learning
• Biological motivation

Kohonen model

• input of any dimension
• output neurons in 1D, 2D or 3D lattice
• Relaxed winner-takes-all
• Competitive learning rule
• Computational motivation

Principles

• Competition: One winner
• Cooperation: Neuron near-by the winner on the lattice get a chance to adapt.
• Adaption: Winner and neighbors increase their func values.

Redundancy in the input is needed.

• Redundancy, can predict based on a know pixel, can reduce dimension
• Structure
• Information content relative to channel capacity

Algorithm

1. Randomly initialized weight vectors $\textbf{w}_i$
2. Randomly sample input vector $\textbf{x}$
3. Find Best Matching Unit (BMU): $i(\textbf{x})=\mbox{argmin}_j \Vert \textbf{x} -\textbf{w}_j \Vert$
4. Update weight vectors: $\textbf{w}_j \leftarrow \textbf{w}_j + \eta h(j,i(\textbf{x}))(\textbf{x}-\textbf{w}_j)$
• $\eta$: learning rate
• $h(j,i(\textbf{x}))$: neighborhood function of BMU
5. Repeat steps 2-4

Neighborhood functions

• Gaussian: $h(j,i(\textbf{x})) = \exp (-\Vert \textbf{r}_j - \textbf{r}_{i(\textbf{x})} \Vert^2/(2\sigma^2))$
• Flat: $h(j,i(\textbf{x})) = 1 \mbox{ if } \Vert \textbf{r}_j - \textbf{r}_{i(\textbf{x})} \Vert \le \sigma; 0 \mbox{ otherwise }$

Params:

• $\sigma$ is called the neighborhood radius
• $\textbf{r}_j$ is the location of unit $j$ on the lattice

Training Tips

• Start with large neighborhood radius. Gradually decrease radius to a small value $\sigma(n) = \sigma_0 \exp\left( - \frac{n}{\tau_1}\right)$
• Start with high learning rate $\eta$ and decrease it like radius

• Self-organization or ordering phase: High learning rate and large neighborhood radius
• Convergence phase: Low learning rate, small neighborhood radius (one or zero)

Performance measures

• Quantization error : Average distance between each data vector and its BMU $\epsilon_Q=\frac{1}{N}\sum_{j=1}^{N} \Vert \textbf{x}_j - \textbf{w}_i(\textbf{x}_j)\Vert$
• Topographic error: The proportion of all data vectors for which first and second BMUs are not adjacent units. $\epsilon_T = \frac{1}{N}\sum_{j=1}^{N} u(\textbf{x}_j)$
• $u(\textbf{x})=1$ if the 1st and 2nd BMUs are not adjacent
• $u(\textbf{x})=0$ otherwise

Summary

• Hebbian learning rule with forgetting term
• Input generated according to a certain probability distribution on a continuous input space
• Topology of network form on the discrete lattice
• Time varying neighborhood function around winner and learning rate

Properties

• Approximation of the input space: collection of weight vectors
• Topological ordering: Near-by neurons on the lattice represent similar input features
• Density matching: More neurons are recruited to represent dense area in the input space
• Feature selection: Select best features to approximate the underlying distribution

High-D inputs

SOM can be trained with inputs of arbitrary dimension. It can reduce dimensionality from N-D to 2-D. Can be used to extract topological features and visualize data.

Application

• Data clustering and visualization
• Optimization problems
• Semantic maps
• NLP
• Preprocessing for signal and image-processing
• Hand-written recognition
• Phonetic map for speech recognition

Vector Quantization

Exploits the structure in the input distribution for the purpose of data compression. SOM can be used to learn it.

• Input space is partitioned into a number of distinct regions and for each region a reconstruction vector is defined.
• A new input will be represented by the reconstruction vector of the region it falls into.

A vector quantizer that minimizes encoding distortion is called a Voronoi or nearest-neighbor quantizer.

Learning

• Train with SOM in unsupervised mode.
• Then tune weight vectors in a supervised mode:
• If class of input vector matches the class of the best matching weight vector $\textbf{w}_c(n+1)=\textbf{w}_c(n) + \alpha_n[\textbf{x}_i-\textbf{w}_c(n)]$
• If class of input vector does not match the class of the best matching weight vector $\textbf{w}_c(n+1)=\textbf{w}_c(n) - \alpha_n[\textbf{x}_i-\textbf{w}_c(n)]$