## TAMU Neural Network 8 Neurodynamics

2017-04-16

Inclusion of feedback gives temporal characteristics to neural networks: recurrent networks

Recurrent networks can become unstable or stable.

Main interest is in recurrent network’s stability: neurodynamics

## Preliminaries: Dynamic systems

• A dynamic system is a system whose state varies with time
• State-space model: values of state variables change over time

### Example:

• A system of order $N$
• $x_1(t),x_2(t),...,x_N(t)$ are state variables that hold different values under independent variable $t$.
• $\textbf{x}(t)$ is called the state vector
• The dynamics of the system is $\frac{d}{dt}x_j(t) = F_j(x_j(t)), j=1,2,…,N$

Or
$\frac{d}{dt}\textbf{x}(t) = \textbf{F}(\textbf{x}(t))$

### System type

• Autonomous: $\textbf{F}(\centerdot)$ does not explicitly depend on time.
• Non-autonomous: $\textbf{F}(\centerdot)$ explicitly depend on time.

$\textbf{F}(\textbf{x})$ can be seen as a velocity vector field. In this field, every point is associated with a vector.

## State space

Can view the state-space equation $\frac{d \textbf{x}}{dt} =\textbf{F}(\textbf{x})$ as describing the motion of a point in N-dimensional space.

The points traversed over time is called the trajectory or the orbit. The tangent vector on it is instantanous velocity.

### Solution of the equation

• Existing solution: $\textbf{F}(\textbf{x})$ is continuous in all of its args.
• Uniqueness: it must meet the Lipschitz condition.

#### Lipschitz condition

Let $\textbf{x}$ and $\textbf{u}$ be a pair of vectors in an open set $\mathcal{M}$ in a normal vector space. For some constant $K$, a vector function $\textbf{F}(\textbf{x})$ that satisfies:

$\Vert \textbf{F}(\textbf{x}) - \textbf{F}(\textbf{u}) \Vert \le K \Vert \textbf{x} - \textbf{u} \Vert$

• K is called Lipschitz constant for $\textbf{F}(\textbf{x})$

If $\partial F_i/\partial x_j$ are finite everywhere, $\textbf{F}(\textbf{x})$ meet the Lipschitz condition.

## Equilibrium states

$\bar{\textbf{x}} \in \mathcal{M}$ is an equilibrium state when:

$\left.\frac{d\textbf{x}}{dt} \right\vert_{\textbf{x}= \bar{\textbf{x}}} = \textbf{F}(\bar{\textbf{x}})=0$

Linearize $\textbf{F}(\textbf{x})$ using Taylor expansion around $\bar{\textbf{x}}$

$\textbf{F}(\textbf{x}) \approx \textbf{F}(\bar{\textbf{x}}) + \textbf{A} \Delta \textbf{x}(t)$

where $\textbf{A}$ is the Jacobian:

$\textbf{A} = \left.\frac{\partial}{\partial \mathbf{x}} \textbf{F}(\textbf{x}) \right\vert_{\textbf{x} =\bar{\textbf{x}}}$

### Stability of linearized system

Jacobian matrix $\textbf{A}$ determine the behavior near equilibrium points. (Eigenvalue of $\textbf{A}$)

$\frac{d}{dt}\Delta \textbf{x}(t) \approx \textbf{A} \Delta \textbf{x}(t)$

Eigenvalue:

$(\textbf{A}-\lambda \textbf{I}) \textbf{x} = 0$

#### Example

2nd-Order System

• Stable node (both eigenvalue real, negative)
• Unstable focus (complex, positive real part)
• Saddle point (real, + -)
• Stable focus (complex, - real)
• Ustable node (real, +)
• Center (complex, 0 real)

### Definition of stability

• Uniformly stable: for an arbitrary $\epsilon > 0$, if there exists a postive $\delta$ such that $% $ implies $% $ for all $t > 0$.
• Convergent: if there exists a positive $\delta$ such that $% $ implies $\textbf{x}(t) \rightarrow \bar{\textbf{x}} \mbox { as } t \rightarrow \infty$
• Asymptotically stable: if both stable and convergence
• Globally asymptotically stable: if stable and all trajectories of the system converge to $\bar{\textbf{x}}$ as time $t$ approaches infinity.

### Lyapunov’s Theorem

• Theorem 1: The equilibrium state $\bar{\textbf{x}}$ is stable if in a small neighborhood of $\bar{\textbf{x}}$ there exists a positive definite function $V(\textbf{x})$ such that its derivative with respect to time is negative semidefinite in that region.
• Theorem 2: The equilibrium state $\bar{\textbf{x}}$ is asymptotically stable if in a small neighborhood of $\bar{\textbf{x}}$ there exists a positive definite function $V(\textbf{x})$ such that its derivative with respect to time is negative definite in that region.
• A scalar function $V(\textbf{x})$ that satisfies these conditions is called a Lyapunov function for the equilibrium state $\bar{\textbf{x}}$

## Attractors

• Dissipative systems are characterized by attracting sets or manifolds of dimensionality lower than that of the embedding space. These are called attractors.
• Regions of initial conditions of nonzero state space volume converge to these attractors as time $t$ increases.

### Type

• Point attractors
• Limit cycle attractors
• Strange (chaotic) attractors

## Neurodynamical models

We will focus on state variables that are continuous-valued.

Propeties:

• Large number of DOF
• Nonlinearity
• Dissipative (opposed to conservative), i.e. open system
• Noise

### Manipulation of attractors

• Can identify attractors with computational objects
• In order to do so, must exercise control over the location of the attractors in the state space of the system
• A learning algorithm will manipulate the equations governing the dynamical behavior so that a desired location of attractors are set
• One good way to do this is to use the energy minimization paradigm. (e.g. Hopfield)

## Hopfield model

• $N$ units with full connection among every node(no self-feedback)
• Given $M$ input patterns, each having the same dimensionality as the network, can be memorized in attractors of the network
• Starting with an initial pattern, the dynamic will converge toward the attractor of the basin of attraction where the inital pattern was placed.

### Discrete Hopfield model

Based on McCulloch-Pitts model

Energy function is defined as

$E = - \frac{1}{2}\sum_{i=1}^{N}\sum_{j=1}^{N} w_{ji} x_i x_j (i \ne j)$

Network dynamics will evolve in the direction that minimizes $E$

• Map a set of patterns to be memorized $\xi_\mu$ onto fixed point $\textbf{x}_\mu$ in the dynamical system realized by the recurrent network
• Encoding: $\xi_\mu$ to $\textbf{x}_\mu$
• Decoding: $\textbf{x}_\mu$ to $\xi_\mu$

### Storage

Learning is similar to Hebbian

$w_{ji}=\frac{1}{N}\sum_{\mu=1}^{M}\xi_{\mu,j} \xi_{\mu,i}$

with $w_{ji} = 0 \mbox{ if } i=j$

Matrix form:

$\textbf{W}=\frac{1}{N}\sum_{\mu=1}^{M} \xi_\mu \xi_\mu^T -M \textbf{I}$

Result $\textbf{W}$ is symmetric.

### Activation

• Initialize network with probe pattern $\mathbf{\xi}_{probe}$ $x_j(0) = \xi_{probe,j}$
• Update output of each neuron(picking them by random) as $x_j(n+1)=sgn\left( \sum_{i=1}^{N} w_{ji}x_i(n) \right)$ until $\textbf{x}$ reaches a fixed point.

### Spurious states

• $\textbf{W}$ is symmetric, thus eigenvalues of it are all real.
• For large number of patters $M$, the matrix is degenerate, i.e., several eigenvectors can have the same eigenvalue.
• These eigenvectors form a subspace, and when the associated eigenvalue is 0, it is called a null space.
• This is due to $M$ being smaller than the number of neurons $N$.
• Hopfield network as content addressable memory:
• Discrete Hopfield network acts as a vector projector (project probe vector onto subspace spanned by training patterns).
• Underlying dynamics drive the network to converge to one of the corners of the unit hypercube.
• Spurious states are those corners of the hypercube that do not belong to the training pattern set.

### Storage capacity

Given a probe equal to the stored pattern $\xi_v$, the activation of the $j$th neuron can be decomposed into the signal term and the noise term.

$v_j = \xi_{v,j}+\frac{1}{N}\sum_{\mu=1,\mu \ne v}^{M}\xi_{v,j} \sum_{i=1}^{N}\xi_{\mu,j} \xi_{v,i}$

signal-to-noise ratio is defined as:

$\rho = \frac{\mbox{variance of signal}}{\mbox{variance of noise}} = \frac{1}{(M-1)/N} \approx \frac{N}{M}$

load parameter $\alpha$ is the reciprocal of $\rho$. $\alpha$ should be less than 0.14.

For almost error-free performance, storage capacity is

$M_c = \frac{N}{2\log_e N}$

Thus the storage capacity of Hopfield network scales less than linearly with size N of the network. It’s a major limitation of the Hopfield model.

## Cohen-Grossberg Theorem

Cohen and Grossberg (1983) showed how to access the stability of a certain class of neural networks.

$\frac{d}{dt} \mu_j = a_j(\mu_j) \left[b_j(\mu_j)- \sum_{i=1}^{N}c_{ji}\varphi_i(\mu_i)\right], j= 1,2,…,N$

Neural network with above dynamics admits a Lyapunov function defined as:

$E = \frac{1}{2}\sum_{i=1}^{N}\sum_{j=1}^{N}c_{ji}\varphi_i(\mu_i)\varphi_j(\mu_j)-\sum_{j=1}^{N}\int_0^{\mu_j}b-j(\lambda)\varphi_j’(\lambda)d\lambda$ where $\varphi_j’(\lambda) = \frac{d}{d\lambda}(\varphi_j(\lambda))$

### Conditions to be met:

• Synaptic weights are symmetric
• Function $a_j(\mu_j)$ satisfies the condition for nonnegativity
• The nonlinear activation function $\varphi_j(\mu_j)$ needs to follow the monotonicity condition: $\varphi_j’(\mu_j) = \frac{d}{d\mu_j}\varphi_j(\mu_j) \ge 0$
• With the above $\frac{dE}{dt} \le 0$ ensuring global stability of the system
• Hopfield model can be seen as a special case of the Cohen-Grossberg Theorem.