Inclusion of feedback gives temporal characteristics to neural networks: **recurrent networks**

Recurrent networks can become unstable or stable.

Main interest is in recurrent network’s **stability**: **neurodynamics**

## Preliminaries: Dynamic systems

- A
**dynamic system**is a system whose state varies with time **State-space model**: values of state variables change over time

### Example:

- A system of order
- are state variables that hold different values under independent variable .
- is called the state vector
- The dynamics of the system is \[\frac{d}{dt}x_j(t) = F_j(x_j(t)), j=1,2,…,N\]

Or

\[\frac{d}{dt}\textbf{x}(t) = \textbf{F}(\textbf{x}(t))\]

### System type

- Autonomous: does not explicitly depend on time.
- Non-autonomous: explicitly depend on time.

can be seen as a velocity vector field. In this field, every point is associated with a vector.

## State space

Can view the state-space equation as describing the motion of a point in N-dimensional space.

The points traversed over time is called the **trajectory** or the **orbit**. The **tangent vector** on it is instantanous velocity.

### Solution of the equation

- Existing solution: is continuous in all of its args.
- Uniqueness: it must meet the
**Lipschitz condition**.

#### Lipschitz condition

Let and be a pair of vectors in an open set in a normal vector space. For some constant , a vector function that satisfies:

\[\Vert \textbf{F}(\textbf{x}) - \textbf{F}(\textbf{u}) \Vert \le K \Vert \textbf{x} - \textbf{u} \Vert\]

- K is called Lipschitz constant for

If are finite everywhere, meet the Lipschitz condition.

## Equilibrium states

is an equilibrium state when:

\[\left.\frac{d\textbf{x}}{dt} \right\vert_{\textbf{x}= \bar{\textbf{x}}} = \textbf{F}(\bar{\textbf{x}})=0\]

**Linearize** using Taylor expansion around

\[\textbf{F}(\textbf{x}) \approx \textbf{F}(\bar{\textbf{x}}) + \textbf{A} \Delta \textbf{x}(t)\]

where is the Jacobian:

\[\textbf{A} = \left.\frac{\partial}{\partial \mathbf{x}} \textbf{F}(\textbf{x}) \right\vert_{\textbf{x} =\bar{\textbf{x}}}\]

### Stability of linearized system

Jacobian matrix determine the behavior near equilibrium points. (Eigenvalue of )

\[\frac{d}{dt}\Delta \textbf{x}(t) \approx \textbf{A} \Delta \textbf{x}(t)\]

Eigenvalue:

\[(\textbf{A}-\lambda \textbf{I}) \textbf{x} = 0\]

#### Example

2nd-Order System

- Stable node (both eigenvalue real, negative)
- Unstable focus (complex, positive real part)
- Saddle point (real, + -)
- Stable focus (complex, - real)
- Ustable node (real, +)
- Center (complex, 0 real)

### Definition of stability

**Uniformly stable**: for an arbitrary , if there exists a postive such that implies for all .**Convergent**: if there exists a positive such that implies**Asymptotically stable**: if both stable and convergence**Globally asymptotically stable**: if stable and all trajectories of the system converge to as time approaches infinity.

### Lyapunov’s Theorem

**Theorem 1**: The equilibrium state is stable if in a small neighborhood of there exists a positive definite function such that its derivative with respect to time is negative semidefinite in that region.**Theorem 2**: The equilibrium state is asymptotically stable if in a small neighborhood of there exists a positive definite function such that its derivative with respect to time is negative definite in that region.- A scalar function that satisfies these conditions is called a
**Lyapunov function**for the equilibrium state

## Attractors

- Dissipative systems are characterized by attracting sets or manifolds of dimensionality lower than that of the embedding space. These are called
**attractors**. - Regions of initial conditions of nonzero state space volume converge to these attractors as time increases.

### Type

- Point attractors
- Limit cycle attractors
- Strange (chaotic) attractors

## Neurodynamical models

We will focus on state variables that are continuous-valued.

Propeties:

- Large number of DOF
- Nonlinearity
- Dissipative (opposed to conservative), i.e. open system
- Noise

### Manipulation of attractors

As a recuccrent Nnet Paradigm

- Can identify attractors with computational objects
- In order to do so, must exercise control over the location of the attractors in the state space of the system
- A learning algorithm will manipulate the equations governing the dynamical behavior so that a desired location of attractors are set
- One good way to do this is to use the
**energy minimization**paradigm. (e.g. Hopfield)

## Hopfield model

- units with full connection among every node(no self-feedback)
- Given input patterns, each having the same dimensionality as the network, can be memorized in attractors of the network
- Starting with an initial pattern, the dynamic will converge toward the attractor of the basin of attraction where the inital pattern was placed.

### Discrete Hopfield model

Based on McCulloch-Pitts model

Energy function is defined as

\[E = - \frac{1}{2}\sum_{i=1}^{N}\sum_{j=1}^{N} w_{ji} x_i x_j (i \ne j)\]

Network dynamics will evolve in the direction that minimizes

### Content-Addressable Memory

- Map a set of patterns to be memorized onto fixed point in the dynamical system realized by the recurrent network
- Encoding: to
- Decoding: to

### Storage

Learning is similar to Hebbian

\[w_{ji}=\frac{1}{N}\sum_{\mu=1}^{M}\xi_{\mu,j} \xi_{\mu,i}\]

with

Matrix form:

\[\textbf{W}=\frac{1}{N}\sum_{\mu=1}^{M} \xi_\mu \xi_\mu^T -M \textbf{I}\]

Result is symmetric.

### Activation

- Initialize network with
**probe pattern**\[x_j(0) = \xi_{probe,j}\] - Update output of each neuron(picking them by random) as \[x_j(n+1)=sgn\left( \sum_{i=1}^{N} w_{ji}x_i(n) \right)\] until reaches a fixed point.

### Spurious states

- is symmetric, thus eigenvalues of it are all real.
- For large number of patters , the matrix is degenerate, i.e., several eigenvectors can have the same eigenvalue.
- These eigenvectors form a subspace, and when the associated eigenvalue is 0, it is called a
*null*space. - This is due to being smaller than the number of neurons .
- Hopfield network as content addressable memory:
- Discrete Hopfield network acts as a vector projector (project probe vector onto subspace spanned by training patterns).
- Underlying dynamics drive the network to converge to one of the corners of the unit hypercube.

- Spurious states are those corners of the hypercube that do not belong to the training pattern set.

### Storage capacity

Given a probe equal to the stored pattern , the activation of the th neuron can be decomposed into the signal term and the noise term.

\[v_j = \xi_{v,j}+\frac{1}{N}\sum_{\mu=1,\mu \ne v}^{M}\xi_{v,j} \sum_{i=1}^{N}\xi_{\mu,j} \xi_{v,i}\]

**signal-to-noise ratio** is defined as:

\[\rho = \frac{\mbox{variance of signal}}{\mbox{variance of noise}} = \frac{1}{(M-1)/N} \approx \frac{N}{M}\]

**load parameter** is the reciprocal of . should be less than 0.14.

For almost error-free performance, storage capacity is

\[M_c = \frac{N}{2\log_e N}\]

Thus the storage capacity of Hopfield network scales less than linearly with size N of the network. It’s a major **limitation** of the Hopfield model.

## Cohen-Grossberg Theorem

Cohen and Grossberg (1983) showed how to access the stability of a certain class of neural networks.

\[\frac{d}{dt} \mu_j = a_j(\mu_j) \left[b_j(\mu_j)- \sum_{i=1}^{N}c_{ji}\varphi_i(\mu_i)\right], j= 1,2,…,N\]

Neural network with above dynamics admits a Lyapunov function defined as:

\[E = \frac{1}{2}\sum_{i=1}^{N}\sum_{j=1}^{N}c_{ji}\varphi_i(\mu_i)\varphi_j(\mu_j)-\sum_{j=1}^{N}\int_0^{\mu_j}b-j(\lambda)\varphi_j’(\lambda)d\lambda\] where \[\varphi_j’(\lambda) = \frac{d}{d\lambda}(\varphi_j(\lambda))\]

### Conditions to be met:

- Synaptic weights are symmetric
- Function satisfies the condition for
*nonnegativity* - The nonlinear activation function needs to follow the
*monotonicity condition*: \[\varphi_j’(\mu_j) = \frac{d}{d\mu_j}\varphi_j(\mu_j) \ge 0\] - With the above \[\frac{dE}{dt} \le 0\] ensuring global stability of the system
- Hopfield model can be seen as a special case of the Cohen-Grossberg Theorem.