 five basic learning rules
 error correction
 Hebbian
 memorybased
 competitive
 Boltzmann
 learning paradigms
 credit assignment problem
 supervised learning
 unsupervised learning
 learning tasks, memory and adaptation
 probabilistic and statistical aspects of learning
learning rules
errorcorrection learning
 input , output and desired response or target output
 error signal
 actuates a control mechanism that gradually adjust the synaptic weights, to minimize the cost function (or index of performance)
 cost function:

when synaptic weights reach a steady state, learning is stopped.
 WidrowHoff rule, with learning rate :
 With that, we can update the weights:
memorybased learning
 All (or most) past experiences are explicitly stored, as inputtarget pairs
 Two classes
 Given a new input , determine class based on local neighborhood of .
 Criterion used for determining the neighborhood
 Learning rule applied to the neighborhood of the input, within the set of training examples.
nearest neighbor
Nearest neighbor of :
\[min_i d(x_i,x_{test}) = d(x_N’,x_{test})\]
where is the Euclidean distance.
 is classified as the same class as
knearest neighbor
 Identify k classified patterns that lie nearest to the test vector , for some integer k.
 Assign to the class that is most frequently represented by the k neighbors (use majority vote)
 In effect, it is like averaging. It can deal with outliers
Hebbian Learning
 If two neurons on either side of a synapse are activated simultaneously, the synapse is strengthened.
 If they are activate asynchronously, the synapse is weekened or eliminated.
Hebbian learning with learning rate :
Covariance rule:
covariance rule
 convergence to a nontrivial state
 prediction of both potentiation and depression
 observations
 weight enhanced when both pre and postsynaptic activities are above average
 weight depressed when
 pre more than average, post less than average
 pre less than average, post more than average
competitive learning
 Output neurons compete with each other for a chance to become active.
 Highly suited to discover statistically salient features (that may aid in classification)
 three basic elements:
 Same type of neurons with different weight sets, so that they respond differently to a given set of inputs
 A limit imposed on strength of each neuron
 Competition mechanism, to choose one winner: winnertakesall neuron.
\[\Delta w_{kj} = \eta (x_jw_{kj}), \mbox{ if } k \mbox{ is the winner}\]
Synaptic weight vector is moved toward the input vector.
Weight vectors converge toward local input clusters: clustering
Boltzmann Learning
 Stochastic learning algorithm rooted in statistical mechanics
 Recurrent network, binary neurons (+1 or 1)
 Energy function
 Activation:
 Choose a random neuron k
 Flip state with a probability (given temperature T)
 \[P(x_k \rightarrow x_k) = (1+exp(\Delta E_k/T))^{1}\]
 is the change in E due to the flip
Boltzmann Machine
Types of neurons
 Visible: can be affected by the environment
 Hidden: isolated
Types of operations
 Clamped: visible neurons are fixed by environmental input and held constant

Freerunning: all neurons update their activity freely.
 Learning
 update weight during both clamped and free running condition
 Train weight with various clamping input patterns
 After training is completed, present new clamping input pattern that is a partial input of one of the known vectors
 Let it run clamped on the new input (subset of visible neurons), and eventually it will complete the pattern
Learning Paradigms
credit assignment
Assign credit or blame for overall outcome to individual decisions.
 for outcomes of actions
 for actions to internal decisions
learning with a teacher
learning without a teacher
two classes
 reinforcement learning/neurodynamic programming
 unsupervised learning/selforganization
reinforcement
Goal is optimize the cumulative cost of actions.
In many cases, learning is under delayed reinforcement.
unsupervised
Based on taskindependent measure
learning tasks, memory and adaptation
pattern association
Associate key pattern with memorized pattern.
pattern classification
Mapping between input pattern and a prescribled number of classes.
function approximation
Nolinear inputoutput mapping.
System identification: learn function of an unknown system
control
Control a plant, adjust plant input u so that the output of the plant y tracks the reference signal d.
filtering/beamforming
Filtering: estimate quantity at time n, based on measurements up to time n
smoothing: estimate quantity at time n, based on measurements up to time n+a
prediction: estimate quantity at time n+a, based on measurements up to time n
memory
q pattern pairs :
Input (key) vector:
Output (memorized) vector:
By a weight matrix:
\[y_k = W(k)x_k\]
Let
\[M = \sum_{k=1}^{q} W(k)=\sum_{k=1}^{q} y_k x_k^T\]
If all are nomalized to length 1, then:
\[M x_j = y_j\]
adaptation
 stationary environment: supervised learning can be used to obtain a relatively stable set of parameters
 nonstationary environment: parameters need to be adapted over time
 locally stationary
statistical nature of learning
Target function:
Neural network realization of the function: or
Random input vectors and random output scalar values
Training set:
regressive model:
Error term has zero mean:
\[E[D \vert x] = f(x)\]
\[E[\epsilon f(X)] = 0\]
Cost function:
\[\mathcal{E}(w)= \frac{1}{2}\sum_{i=1}^{N}(d_i  F(x_i,w))^2\]
\[\mathcal{E}(w)= \frac{1}{2}E_T[\epsilon^2] + E_T[\epsilon(f(x)F(x,T))]+\frac{1}{2}E_T[(f(x)F(x,T))^2]\]
The first term is intrinsic error; second reduces to 0; we are interested in the third term.
bias and variance
 bias: how much differs from the true function , approximation error
 variance: the variance in over entire training set , estimation error
VC dimension
VapnikChervonenkis
Shattering: a dichotomy of a set S is a partition of S into two disjoint subsets.
A set of instances S is shattered by a function class if and only if for every dichotomy of S there exists some function in consistent with this dichotomy.
The VC dimension is the size of the largest finite subset shattered by that function.
At least one subset of a size can be shattered, then this size can be shattered by that .
If is a set of lines,
VC dimension increases:
 Training error decreases
 confidence interval increases
 sample complexity increases