Introduction
SOM is based on competitive learning where output neurons compete with each other to be activated. (Kohonen, 1982)
- Output neuron that activates is called winner-takes-all neuron
- Lateral inhibition is one way to inplement competition for map formation
- In SOM, neurons are placed on a lattice, on which a feature map is created, a meaningful coordinate system for different features.
- The lattice thus forms a topographic map where the spatial location on the lattice is indicative of the input features.
Two models
Willshaw-von der Malsburg model:
- input neurons arranged in 2D lattice
- output in 2D lattice
- Lateral excitation/inhibition (Mexican hat) gives rise to soft competition
- Normalized Hebbian learning
- Biological motivation
Kohonen model
- input of any dimension
- output neurons in 1D, 2D or 3D lattice
- Relaxed winner-takes-all
- Competitive learning rule
- Computational motivation
Principles
- Competition: One winner
- Cooperation: Neuron near-by the winner on the lattice get a chance to adapt.
- Adaption: Winner and neighbors increase their func values.
Redundancy in the input is needed.
- Redundancy, can predict based on a know pixel, can reduce dimension
- Structure
- Information content relative to channel capacity
Algorithm
- Randomly initialized weight vectors
- Randomly sample input vector
- Find Best Matching Unit (BMU): \[i(\textbf{x})=\mbox{argmin}_j \Vert \textbf{x} -\textbf{w}_j \Vert\]
- Update weight vectors: \[\textbf{w}_j \leftarrow \textbf{w}_j + \eta h(j,i(\textbf{x}))(\textbf{x}-\textbf{w}_j)\]
- : learning rate
- : neighborhood function of BMU
- Repeat steps 2-4
Neighborhood functions
- Gaussian:
- Flat:
Params:
- is called the neighborhood radius
- is the location of unit on the lattice
Training Tips
- Start with large neighborhood radius. Gradually decrease radius to a small value \[\sigma(n) = \sigma_0 \exp\left( - \frac{n}{\tau_1}\right)\]
- Start with high learning rate and decrease it like radius
Two phases of adaptation
- Self-organization or ordering phase: High learning rate and large neighborhood radius
- Convergence phase: Low learning rate, small neighborhood radius (one or zero)
Performance measures
- Quantization error : Average distance between each data vector and its BMU \[\epsilon_Q=\frac{1}{N}\sum_{j=1}^{N} \Vert \textbf{x}_j - \textbf{w}_i(\textbf{x}_j)\Vert\]
- Topographic error: The proportion of all data vectors for which first and second BMUs are not adjacent units. \[\epsilon_T = \frac{1}{N}\sum_{j=1}^{N} u(\textbf{x}_j)\]
- if the 1st and 2nd BMUs are not adjacent
- otherwise
Summary
- Hebbian learning rule with forgetting term
- Input generated according to a certain probability distribution on a continuous input space
- Topology of network form on the discrete lattice
- Time varying neighborhood function around winner and learning rate
Properties
- Approximation of the input space: collection of weight vectors
- Topological ordering: Near-by neurons on the lattice represent similar input features
- Density matching: More neurons are recruited to represent dense area in the input space
- Feature selection: Select best features to approximate the underlying distribution
High-D inputs
SOM can be trained with inputs of arbitrary dimension. It can reduce dimensionality from N-D to 2-D. Can be used to extract topological features and visualize data.
Application
- Data clustering and visualization
- Optimization problems
- Semantic maps
- NLP
- Preprocessing for signal and image-processing
- Hand-written recognition
- Phonetic map for speech recognition
Vector Quantization
Exploits the structure in the input distribution for the purpose of data compression. SOM can be used to learn it.
- Input space is partitioned into a number of distinct regions and for each region a reconstruction vector is defined.
- A new input will be represented by the reconstruction vector of the region it falls into.
A vector quantizer that minimizes encoding distortion is called a Voronoi or nearest-neighbor quantizer.
Learning
- Train with SOM in unsupervised mode.
- Then tune weight vectors in a supervised mode:
- If class of input vector matches the class of the best matching weight vector \[\textbf{w}_c(n+1)=\textbf{w}_c(n) + \alpha_n[\textbf{x}_i-\textbf{w}_c(n)]\]
- If class of input vector does not match the class of the best matching weight vector \[\textbf{w}_c(n+1)=\textbf{w}_c(n) - \alpha_n[\textbf{x}_i-\textbf{w}_c(n)]\]