Inclusion of feedback gives temporal characteristics to neural networks: recurrent networks
Recurrent networks can become unstable or stable.
Main interest is in recurrent network’s stability: neurodynamics
Preliminaries: Dynamic systems
- A dynamic system is a system whose state varies with time
- State-space model: values of state variables change over time
Example:
- A system of order
- are state variables that hold different values under independent variable .
- is called the state vector
- The dynamics of the system is \[\frac{d}{dt}x_j(t) = F_j(x_j(t)), j=1,2,…,N\]
Or
\[\frac{d}{dt}\textbf{x}(t) = \textbf{F}(\textbf{x}(t))\]
System type
- Autonomous: does not explicitly depend on time.
- Non-autonomous: explicitly depend on time.
can be seen as a velocity vector field. In this field, every point is associated with a vector.
State space
Can view the state-space equation as describing the motion of a point in N-dimensional space.
The points traversed over time is called the trajectory or the orbit. The tangent vector on it is instantanous velocity.
Solution of the equation
- Existing solution: is continuous in all of its args.
- Uniqueness: it must meet the Lipschitz condition.
Lipschitz condition
Let and be a pair of vectors in an open set in a normal vector space. For some constant , a vector function that satisfies:
\[\Vert \textbf{F}(\textbf{x}) - \textbf{F}(\textbf{u}) \Vert \le K \Vert \textbf{x} - \textbf{u} \Vert\]
- K is called Lipschitz constant for
If are finite everywhere, meet the Lipschitz condition.
Equilibrium states
is an equilibrium state when:
\[\left.\frac{d\textbf{x}}{dt} \right\vert_{\textbf{x}= \bar{\textbf{x}}} = \textbf{F}(\bar{\textbf{x}})=0\]
Linearize using Taylor expansion around
\[\textbf{F}(\textbf{x}) \approx \textbf{F}(\bar{\textbf{x}}) + \textbf{A} \Delta \textbf{x}(t)\]
where is the Jacobian:
\[\textbf{A} = \left.\frac{\partial}{\partial \mathbf{x}} \textbf{F}(\textbf{x}) \right\vert_{\textbf{x} =\bar{\textbf{x}}}\]
Stability of linearized system
Jacobian matrix determine the behavior near equilibrium points. (Eigenvalue of )
\[\frac{d}{dt}\Delta \textbf{x}(t) \approx \textbf{A} \Delta \textbf{x}(t)\]
Eigenvalue:
\[(\textbf{A}-\lambda \textbf{I}) \textbf{x} = 0\]
Example
2nd-Order System
- Stable node (both eigenvalue real, negative)
- Unstable focus (complex, positive real part)
- Saddle point (real, + -)
- Stable focus (complex, - real)
- Ustable node (real, +)
- Center (complex, 0 real)
Definition of stability
- Uniformly stable: for an arbitrary , if there exists a postive such that implies for all .
- Convergent: if there exists a positive such that implies
- Asymptotically stable: if both stable and convergence
- Globally asymptotically stable: if stable and all trajectories of the system converge to as time approaches infinity.
Lyapunov’s Theorem
- Theorem 1: The equilibrium state is stable if in a small neighborhood of there exists a positive definite function such that its derivative with respect to time is negative semidefinite in that region.
- Theorem 2: The equilibrium state is asymptotically stable if in a small neighborhood of there exists a positive definite function such that its derivative with respect to time is negative definite in that region.
- A scalar function that satisfies these conditions is called a Lyapunov function for the equilibrium state
Attractors
- Dissipative systems are characterized by attracting sets or manifolds of dimensionality lower than that of the embedding space. These are called attractors.
- Regions of initial conditions of nonzero state space volume converge to these attractors as time increases.
Type
- Point attractors
- Limit cycle attractors
- Strange (chaotic) attractors
Neurodynamical models
We will focus on state variables that are continuous-valued.
Propeties:
- Large number of DOF
- Nonlinearity
- Dissipative (opposed to conservative), i.e. open system
- Noise
Manipulation of attractors
As a recuccrent Nnet Paradigm
- Can identify attractors with computational objects
- In order to do so, must exercise control over the location of the attractors in the state space of the system
- A learning algorithm will manipulate the equations governing the dynamical behavior so that a desired location of attractors are set
- One good way to do this is to use the energy minimization paradigm. (e.g. Hopfield)
Hopfield model
- units with full connection among every node(no self-feedback)
- Given input patterns, each having the same dimensionality as the network, can be memorized in attractors of the network
- Starting with an initial pattern, the dynamic will converge toward the attractor of the basin of attraction where the inital pattern was placed.
Discrete Hopfield model
Based on McCulloch-Pitts model
Energy function is defined as
\[E = - \frac{1}{2}\sum_{i=1}^{N}\sum_{j=1}^{N} w_{ji} x_i x_j (i \ne j)\]
Network dynamics will evolve in the direction that minimizes
Content-Addressable Memory
- Map a set of patterns to be memorized onto fixed point in the dynamical system realized by the recurrent network
- Encoding: to
- Decoding: to
Storage
Learning is similar to Hebbian
\[w_{ji}=\frac{1}{N}\sum_{\mu=1}^{M}\xi_{\mu,j} \xi_{\mu,i}\]
with
Matrix form:
\[\textbf{W}=\frac{1}{N}\sum_{\mu=1}^{M} \xi_\mu \xi_\mu^T -M \textbf{I}\]
Result is symmetric.
Activation
- Initialize network with probe pattern \[x_j(0) = \xi_{probe,j}\]
- Update output of each neuron(picking them by random) as \[x_j(n+1)=sgn\left( \sum_{i=1}^{N} w_{ji}x_i(n) \right)\] until reaches a fixed point.
Spurious states
- is symmetric, thus eigenvalues of it are all real.
- For large number of patters , the matrix is degenerate, i.e., several eigenvectors can have the same eigenvalue.
- These eigenvectors form a subspace, and when the associated eigenvalue is 0, it is called a null space.
- This is due to being smaller than the number of neurons .
- Hopfield network as content addressable memory:
- Discrete Hopfield network acts as a vector projector (project probe vector onto subspace spanned by training patterns).
- Underlying dynamics drive the network to converge to one of the corners of the unit hypercube.
- Spurious states are those corners of the hypercube that do not belong to the training pattern set.
Storage capacity
Given a probe equal to the stored pattern , the activation of the th neuron can be decomposed into the signal term and the noise term.
\[v_j = \xi_{v,j}+\frac{1}{N}\sum_{\mu=1,\mu \ne v}^{M}\xi_{v,j} \sum_{i=1}^{N}\xi_{\mu,j} \xi_{v,i}\]
signal-to-noise ratio is defined as:
\[\rho = \frac{\mbox{variance of signal}}{\mbox{variance of noise}} = \frac{1}{(M-1)/N} \approx \frac{N}{M}\]
load parameter is the reciprocal of . should be less than 0.14.
For almost error-free performance, storage capacity is
\[M_c = \frac{N}{2\log_e N}\]
Thus the storage capacity of Hopfield network scales less than linearly with size N of the network. It’s a major limitation of the Hopfield model.
Cohen-Grossberg Theorem
Cohen and Grossberg (1983) showed how to access the stability of a certain class of neural networks.
\[\frac{d}{dt} \mu_j = a_j(\mu_j) \left[b_j(\mu_j)- \sum_{i=1}^{N}c_{ji}\varphi_i(\mu_i)\right], j= 1,2,…,N\]
Neural network with above dynamics admits a Lyapunov function defined as:
\[E = \frac{1}{2}\sum_{i=1}^{N}\sum_{j=1}^{N}c_{ji}\varphi_i(\mu_i)\varphi_j(\mu_j)-\sum_{j=1}^{N}\int_0^{\mu_j}b-j(\lambda)\varphi_j’(\lambda)d\lambda\] where \[\varphi_j’(\lambda) = \frac{d}{d\lambda}(\varphi_j(\lambda))\]
Conditions to be met:
- Synaptic weights are symmetric
- Function satisfies the condition for nonnegativity
- The nonlinear activation function needs to follow the monotonicity condition: \[\varphi_j’(\mu_j) = \frac{d}{d\mu_j}\varphi_j(\mu_j) \ge 0\]
- With the above \[\frac{dE}{dt} \le 0\] ensuring global stability of the system
- Hopfield model can be seen as a special case of the Cohen-Grossberg Theorem.