TAMU Neural Network 7 Self-Organizing Maps

Introduction

SOM is based on competitive learning where output neurons compete with each other to be activated. (Kohonen, 1982)

Output neuron that activates is called winner-takes-all neuron
Lateral inhibition is one way to inplement competition for map formation
In SOM, neurons are placed on a lattice, on which a feature map is created, a meaningful coordinate system for different features.
The lattice thus forms a topographic map where the spatial location on the lattice is indicative of the input features.

Two models

Willshaw-von der Malsburg model:

input neurons arranged in 2D lattice
output in 2D lattice
Lateral excitation/inhibition (Mexican hat) gives rise to soft competition
Normalized Hebbian learning
Biological motivation

Kohonen model

input of any dimension
output neurons in 1D, 2D or 3D lattice
Relaxed winner-takes-all
Competitive learning rule
Computational motivation

Principles

Competition: One winner
Cooperation: Neuron near-by the winner on the lattice get a chance to adapt.
Adaption: Winner and neighbors increase their func values.

Redundancy in the input is needed.

Redundancy, can predict based on a know pixel, can reduce dimension
Structure
Information content relative to channel capacity

Algorithm

Randomly initialized weight vectors $\textbf{w}_i$
Randomly sample input vector $\textbf{x}$
Find Best Matching Unit (BMU): \[i(\textbf{x})=\mbox{argmin}_j \Vert \textbf{x} -\textbf{w}_j \Vert\]
Update weight vectors: \[\textbf{w}_j \leftarrow \textbf{w}_j + \eta h(j,i(\textbf{x}))(\textbf{x}-\textbf{w}_j)\]
- $\eta$ : learning rate
- $h(j,i(\textbf{x}))$ : neighborhood function of BMU
Repeat steps 2-4

Neighborhood functions

Gaussian: $h(j,i(\textbf{x})) = \exp (-\Vert \textbf{r}_j - \textbf{r}_{i(\textbf{x})} \Vert^2/(2\sigma^2))$
Flat: $h(j,i(\textbf{x})) = 1 \mbox{ if } \Vert \textbf{r}_j - \textbf{r}_{i(\textbf{x})} \Vert \le \sigma; 0 \mbox{ otherwise }$

Params:

$\sigma$ is called the neighborhood radius
$\textbf{r}_j$ is the location of unit $j$ on the lattice

Training Tips

Start with large neighborhood radius. Gradually decrease radius to a small value \[\sigma(n) = \sigma_0 \exp\left( - \frac{n}{\tau_1}\right)\]
Start with high learning rate $\eta$ and decrease it like radius

Two phases of adaptation

Self-organization or ordering phase: High learning rate and large neighborhood radius
Convergence phase: Low learning rate, small neighborhood radius (one or zero)

Performance measures

Quantization error : Average distance between each data vector and its BMU \[\epsilon_Q=\frac{1}{N}\sum_{j=1}^{N} \Vert \textbf{x}_j - \textbf{w}_i(\textbf{x}_j)\Vert\]
Topographic error: The proportion of all data vectors for which first and second BMUs are not adjacent units. \[\epsilon_T = \frac{1}{N}\sum_{j=1}^{N} u(\textbf{x}_j)\]
- $u(\textbf{x})=1$ if the 1st and 2nd BMUs are not adjacent
- $u(\textbf{x})=0$ otherwise

Summary

Hebbian learning rule with forgetting term
Input generated according to a certain probability distribution on a continuous input space
Topology of network form on the discrete lattice
Time varying neighborhood function around winner and learning rate

Properties

Approximation of the input space: collection of weight vectors
Topological ordering: Near-by neurons on the lattice represent similar input features
Density matching: More neurons are recruited to represent dense area in the input space
Feature selection: Select best features to approximate the underlying distribution

High-D inputs

SOM can be trained with inputs of arbitrary dimension. It can reduce dimensionality from N-D to 2-D. Can be used to extract topological features and visualize data.

Application

Data clustering and visualization
Optimization problems
Semantic maps
- NLP
Preprocessing for signal and image-processing
- Hand-written recognition
- Phonetic map for speech recognition

Vector Quantization

Exploits the structure in the input distribution for the purpose of data compression. SOM can be used to learn it.

Input space is partitioned into a number of distinct regions and for each region a reconstruction vector is defined.
A new input will be represented by the reconstruction vector of the region it falls into.

A vector quantizer that minimizes encoding distortion is called a Voronoi or nearest-neighbor quantizer.

Learning

Train with SOM in unsupervised mode.
Then tune weight vectors in a supervised mode:
- If class of input vector matches the class of the best matching weight vector \[\textbf{w}_c(n+1)=\textbf{w}_c(n) + \alpha_n[\textbf{x}_i-\textbf{w}_c(n)]\]
- If class of input vector does not match the class of the best matching weight vector \[\textbf{w}_c(n+1)=\textbf{w}_c(n) - \alpha_n[\textbf{x}_i-\textbf{w}_c(n)]\]

AI 2

Algorithm 17

Amazon 1

Authorization 1

Blog 3

Bootstrap 1

C++ 1

CCpp 5

CSS 2

Cloud 3

Code 1

Crawler 1

DNS 1

Database 17

DeepLearning 1

Design 17

Development 1

Docker 1

English 1

Express 1

GDB 1

Go 3

Google 4

HTML 3

IOS 1

Java 17

Javascript 4

Jekyll 1

Linux 4

MacOS 2

MachineLearning 17

Markdown 4

Mobile 1

MongoDB 2

Multi-threading 3

NAS 1

Network 11

NeuralNetwork 10

Node 1

OS 8

Public-speaking 1

Python 15

RESTful 1

Rails 9

React 1

Redis 1

Ruby 6

Shell 2

Spring 2

System 17

TCP 1

TDD 1

Thread 2

Vim 1

awk 1

git 1

jQuery 1

media 1

network 1

php 1