Read "Streaming Systems" 1&2, Streaming 101 Read "F1, a distributed SQL database that scales" Read "Zanzibar, Google’s Consistent, Global Authorization System" Read "Spanner, Google's Globally-Distributed Database" Read "Designing Data-intensive applications" 12, The Future of Data Systems IOS development with Swift Read "Designing Data-intensive applications" 10&11, Batch and Stream Processing Read "Designing Data-intensive applications" 9, Consistency and Consensus Read "Designing Data-intensive applications" 8, Distributed System Troubles Read "Designing Data-intensive applications" 7, Transactions Read "Designing Data-intensive applications" 6, Partitioning Read "Designing Data-intensive applications" 5, Replication Read "Designing Data-intensive applications" 3&4, Storage, Retrieval, Encoding Read "Designing Data-intensive applications" 1&2, Foundation of Data Systems Three cases of binary search TAMU Operating System 2 Memory Management TAMU Operating System 1 Introduction Overview in cloud computing 2 TAMU Operating System 7 Virtualization TAMU Operating System 6 File System TAMU Operating System 5 I/O and Disk Management TAMU Operating System 4 Synchronization TAMU Operating System 3 Concurrency and Threading TAMU Computer Networks 5 Data Link Layer TAMU Computer Networks 4 Network Layer TAMU Computer Networks 3 Transport Layer TAMU Computer Networks 2 Application Layer TAMU Computer Networks 1 Introduction Overview in distributed systems and cloud computing 1 A well-optimized Union-Find implementation, in Java A heap implementation supporting deletion TAMU Advanced Algorithms 3, Maximum Bandwidth Path (Dijkstra, MST, Linear) TAMU Advanced Algorithms 2, B+ tree and Segment Intersection TAMU Advanced Algorithms 1, BST, 2-3 Tree and Heap TAMU AI, Searching problems Factorization Machine and Field-aware Factorization Machine for CTR prediction TAMU Neural Network 10 Information-Theoretic Models TAMU Neural Network 9 Principal Component Analysis TAMU Neural Network 8 Neurodynamics TAMU Neural Network 7 Self-Organizing Maps TAMU Neural Network 6 Deep Learning Overview TAMU Neural Network 5 Radial-Basis Function Networks TAMU Neural Network 4 Multi-Layer Perceptrons TAMU Neural Network 3 Single-Layer Perceptrons Princeton Algorithms P1W6 Hash Tables & Symbol Table Applications Stanford ML 11 Application Example Photo OCR Stanford ML 10 Large Scale Machine Learning Stanford ML 9 Anomaly Detection and Recommender Systems Stanford ML 8 Clustering & Principal Component Analysis Princeton Algorithms P1W5 Balanced Search Trees TAMU Neural Network 2 Learning Processes TAMU Neural Network 1 Introduction Stanford ML 7 Support Vector Machine Stanford ML 6 Evaluate Algorithms Princeton Algorithms P1W4 Priority Queues and Symbol Tables Stanford ML 5 Neural Networks Learning Princeton Algorithms P1W3 Mergesort and Quicksort Stanford ML 4 Neural Networks Basics Princeton Algorithms P1W2 Stack and Queue, Basic Sorts Stanford ML 3 Classification Problems Stanford ML 2 Multivariate Regression and Normal Equation Princeton Algorithms P1W1 Union and Find Stanford ML 1 Introduction and Parameter Learning

TAMU Neural Network 7 Self-Organizing Maps



SOM is based on competitive learning where output neurons compete with each other to be activated. (Kohonen, 1982)

  • Output neuron that activates is called winner-takes-all neuron
  • Lateral inhibition is one way to inplement competition for map formation
  • In SOM, neurons are placed on a lattice, on which a feature map is created, a meaningful coordinate system for different features.
  • The lattice thus forms a topographic map where the spatial location on the lattice is indicative of the input features.

Two models

Willshaw-von der Malsburg model:

  • input neurons arranged in 2D lattice
  • output in 2D lattice
  • Lateral excitation/inhibition (Mexican hat) gives rise to soft competition
  • Normalized Hebbian learning
  • Biological motivation

Kohonen model

  • input of any dimension
  • output neurons in 1D, 2D or 3D lattice
  • Relaxed winner-takes-all
  • Competitive learning rule
  • Computational motivation


  • Competition: One winner
  • Cooperation: Neuron near-by the winner on the lattice get a chance to adapt.
  • Adaption: Winner and neighbors increase their func values.

Redundancy in the input is needed.

  • Redundancy, can predict based on a know pixel, can reduce dimension
  • Structure
  • Information content relative to channel capacity


  1. Randomly initialized weight vectors
  2. Randomly sample input vector
  3. Find Best Matching Unit (BMU): \[i(\textbf{x})=\mbox{argmin}_j \Vert \textbf{x} -\textbf{w}_j \Vert\]
  4. Update weight vectors: \[\textbf{w}_j \leftarrow \textbf{w}_j + \eta h(j,i(\textbf{x}))(\textbf{x}-\textbf{w}_j)\]
    • : learning rate
    • : neighborhood function of BMU
  5. Repeat steps 2-4

Neighborhood functions

  • Gaussian:
  • Flat:


  • is called the neighborhood radius
  • is the location of unit on the lattice

Training Tips

  • Start with large neighborhood radius. Gradually decrease radius to a small value \[\sigma(n) = \sigma_0 \exp\left( - \frac{n}{\tau_1}\right)\]
  • Start with high learning rate and decrease it like radius

Two phases of adaptation

  • Self-organization or ordering phase: High learning rate and large neighborhood radius
  • Convergence phase: Low learning rate, small neighborhood radius (one or zero)

Performance measures

  • Quantization error : Average distance between each data vector and its BMU \[\epsilon_Q=\frac{1}{N}\sum_{j=1}^{N} \Vert \textbf{x}_j - \textbf{w}_i(\textbf{x}_j)\Vert\]
  • Topographic error: The proportion of all data vectors for which first and second BMUs are not adjacent units. \[\epsilon_T = \frac{1}{N}\sum_{j=1}^{N} u(\textbf{x}_j)\]
    • if the 1st and 2nd BMUs are not adjacent
    • otherwise


  • Hebbian learning rule with forgetting term
  • Input generated according to a certain probability distribution on a continuous input space
  • Topology of network form on the discrete lattice
  • Time varying neighborhood function around winner and learning rate


  • Approximation of the input space: collection of weight vectors
  • Topological ordering: Near-by neurons on the lattice represent similar input features
  • Density matching: More neurons are recruited to represent dense area in the input space
  • Feature selection: Select best features to approximate the underlying distribution

High-D inputs

SOM can be trained with inputs of arbitrary dimension. It can reduce dimensionality from N-D to 2-D. Can be used to extract topological features and visualize data.


  • Data clustering and visualization
  • Optimization problems
  • Semantic maps
    • NLP
  • Preprocessing for signal and image-processing
    • Hand-written recognition
    • Phonetic map for speech recognition

Vector Quantization

Exploits the structure in the input distribution for the purpose of data compression. SOM can be used to learn it.

  • Input space is partitioned into a number of distinct regions and for each region a reconstruction vector is defined.
  • A new input will be represented by the reconstruction vector of the region it falls into.

A vector quantizer that minimizes encoding distortion is called a Voronoi or nearest-neighbor quantizer.


  • Train with SOM in unsupervised mode.
  • Then tune weight vectors in a supervised mode:
    • If class of input vector matches the class of the best matching weight vector \[\textbf{w}_c(n+1)=\textbf{w}_c(n) + \alpha_n[\textbf{x}_i-\textbf{w}_c(n)]\]
    • If class of input vector does not match the class of the best matching weight vector \[\textbf{w}_c(n+1)=\textbf{w}_c(n) - \alpha_n[\textbf{x}_i-\textbf{w}_c(n)]\]

Creative Commons License
Melon blog is created by melonskin. This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
© 2016-2019. All rights reserved by melonskin. Powered by Jekyll.