TAMU Neural Network 8 Neurodynamics

Inclusion of feedback gives temporal characteristics to neural networks: recurrent networks

Recurrent networks can become unstable or stable.

Main interest is in recurrent network’s stability: neurodynamics

Preliminaries: Dynamic systems

A dynamic system is a system whose state varies with time
State-space model: values of state variables change over time

Example:

A system of order $N$
$x_1(t),x_2(t),...,x_N(t)$ are state variables that hold different values under independent variable $t$ .
$\textbf{x}(t)$ is called the state vector
The dynamics of the system is \[\frac{d}{dt}x_j(t) = F_j(x_j(t)), j=1,2,…,N\]

Or
\[\frac{d}{dt}\textbf{x}(t) = \textbf{F}(\textbf{x}(t))\]

System type

Autonomous: $\textbf{F}(\centerdot)$ does not explicitly depend on time.
Non-autonomous: $\textbf{F}(\centerdot)$ explicitly depend on time.

$\textbf{F}(\textbf{x})$ can be seen as a velocity vector field. In this field, every point is associated with a vector.

State space

Can view the state-space equation $\frac{d \textbf{x}}{dt} =\textbf{F}(\textbf{x})$ as describing the motion of a point in N-dimensional space.

The points traversed over time is called the trajectory or the orbit. The tangent vector on it is instantanous velocity.

Solution of the equation

Existing solution: $\textbf{F}(\textbf{x})$ is continuous in all of its args.
Uniqueness: it must meet the Lipschitz condition.

Lipschitz condition

Let $\textbf{x}$ and $\textbf{u}$ be a pair of vectors in an open set $\mathcal{M}$ in a normal vector space. For some constant $K$ , a vector function $\textbf{F}(\textbf{x})$ that satisfies:

\[\Vert \textbf{F}(\textbf{x}) - \textbf{F}(\textbf{u}) \Vert \le K \Vert \textbf{x} - \textbf{u} \Vert\]

K is called Lipschitz constant for $\textbf{F}(\textbf{x})$

If $\partial F_i/\partial x_j$ are finite everywhere, $\textbf{F}(\textbf{x})$ meet the Lipschitz condition.

Equilibrium states

$\bar{\textbf{x}} \in \mathcal{M}$ is an equilibrium state when:

\[\left.\frac{d\textbf{x}}{dt} \right\vert_{\textbf{x}= \bar{\textbf{x}}} = \textbf{F}(\bar{\textbf{x}})=0\]

Linearize $\textbf{F}(\textbf{x})$ using Taylor expansion around $\bar{\textbf{x}}$

\[\textbf{F}(\textbf{x}) \approx \textbf{F}(\bar{\textbf{x}}) + \textbf{A} \Delta \textbf{x}(t)\]

where $\textbf{A}$ is the Jacobian:

\[\textbf{A} = \left.\frac{\partial}{\partial \mathbf{x}} \textbf{F}(\textbf{x}) \right\vert_{\textbf{x} =\bar{\textbf{x}}}\]

Stability of linearized system

Jacobian matrix $\textbf{A}$ determine the behavior near equilibrium points. (Eigenvalue of $\textbf{A}$ )

\[\frac{d}{dt}\Delta \textbf{x}(t) \approx \textbf{A} \Delta \textbf{x}(t)\]

Eigenvalue:

\[(\textbf{A}-\lambda \textbf{I}) \textbf{x} = 0\]

Example

2nd-Order System

Stable node (both eigenvalue real, negative)
Unstable focus (complex, positive real part)
Saddle point (real, + -)
Stable focus (complex, - real)
Ustable node (real, +)
Center (complex, 0 real)

Definition of stability

Uniformly stable: for an arbitrary $\epsilon > 0$ , if there exists a postive $\delta$ such that $\Vert \textbf{x}(0) - \bar{\textbf{x}}\Vert < \delta$ implies $\Vert \textbf{x}(t) - \bar{\textbf{x}}\Vert < \epsilon$ for all $t > 0$ .
Convergent: if there exists a positive $\delta$ such that $\Vert \textbf{x}(0) - \bar{\textbf{x}}\Vert < \delta$ implies $\textbf{x}(t) \rightarrow \bar{\textbf{x}} \mbox { as } t \rightarrow \infty$
Asymptotically stable: if both stable and convergence
Globally asymptotically stable: if stable and all trajectories of the system converge to $\bar{\textbf{x}}$ as time $t$ approaches infinity.

Lyapunov’s Theorem

Theorem 1: The equilibrium state $\bar{\textbf{x}}$ is stable if in a small neighborhood of $\bar{\textbf{x}}$ there exists a positive definite function $V(\textbf{x})$ such that its derivative with respect to time is negative semidefinite in that region.
Theorem 2: The equilibrium state $\bar{\textbf{x}}$ is asymptotically stable if in a small neighborhood of $\bar{\textbf{x}}$ there exists a positive definite function $V(\textbf{x})$ such that its derivative with respect to time is negative definite in that region.
A scalar function $V(\textbf{x})$ that satisfies these conditions is called a Lyapunov function for the equilibrium state $\bar{\textbf{x}}$

Attractors

Dissipative systems are characterized by attracting sets or manifolds of dimensionality lower than that of the embedding space. These are called attractors.
Regions of initial conditions of nonzero state space volume converge to these attractors as time $t$ increases.

Type

Point attractors
Limit cycle attractors
Strange (chaotic) attractors

Neurodynamical models

We will focus on state variables that are continuous-valued.

Propeties:

Large number of DOF
Nonlinearity
Dissipative (opposed to conservative), i.e. open system
Noise

Manipulation of attractors

As a recuccrent Nnet Paradigm

Can identify attractors with computational objects
In order to do so, must exercise control over the location of the attractors in the state space of the system
A learning algorithm will manipulate the equations governing the dynamical behavior so that a desired location of attractors are set
One good way to do this is to use the energy minimization paradigm. (e.g. Hopfield)

Hopfield model

$N$ units with full connection among every node(no self-feedback)
Given $M$ input patterns, each having the same dimensionality as the network, can be memorized in attractors of the network
Starting with an initial pattern, the dynamic will converge toward the attractor of the basin of attraction where the inital pattern was placed.

Discrete Hopfield model

Based on McCulloch-Pitts model

Energy function is defined as

\[E = - \frac{1}{2}\sum_{i=1}^{N}\sum_{j=1}^{N} w_{ji} x_i x_j (i \ne j)\]

Network dynamics will evolve in the direction that minimizes $E$

Content-Addressable Memory

Map a set of patterns to be memorized $\xi_\mu$ onto fixed point $\textbf{x}_\mu$ in the dynamical system realized by the recurrent network
Encoding: $\xi_\mu$ to $\textbf{x}_\mu$
Decoding: $\textbf{x}_\mu$ to $\xi_\mu$

Storage

Learning is similar to Hebbian

\[w_{ji}=\frac{1}{N}\sum_{\mu=1}^{M}\xi_{\mu,j} \xi_{\mu,i}\]

with $w_{ji} = 0 \mbox{ if } i=j$

Matrix form:

\[\textbf{W}=\frac{1}{N}\sum_{\mu=1}^{M} \xi_\mu \xi_\mu^T -M \textbf{I}\]

Result $\textbf{W}$ is symmetric.

Activation

Initialize network with probe pattern $\mathbf{\xi}_{probe}$ \[x_j(0) = \xi_{probe,j}\]
Update output of each neuron(picking them by random) as \[x_j(n+1)=sgn\left( \sum_{i=1}^{N} w_{ji}x_i(n) \right)\] until $\textbf{x}$ reaches a fixed point.

Spurious states

$\textbf{W}$ is symmetric, thus eigenvalues of it are all real.
For large number of patters $M$ , the matrix is degenerate, i.e., several eigenvectors can have the same eigenvalue.
These eigenvectors form a subspace, and when the associated eigenvalue is 0, it is called a null space.
This is due to $M$ being smaller than the number of neurons $N$ .
Hopfield network as content addressable memory:
- Discrete Hopfield network acts as a vector projector (project probe vector onto subspace spanned by training patterns).
- Underlying dynamics drive the network to converge to one of the corners of the unit hypercube.
Spurious states are those corners of the hypercube that do not belong to the training pattern set.

Storage capacity

Given a probe equal to the stored pattern $\xi_v$ , the activation of the $j$ th neuron can be decomposed into the signal term and the noise term.

\[v_j = \xi_{v,j}+\frac{1}{N}\sum_{\mu=1,\mu \ne v}^{M}\xi_{v,j} \sum_{i=1}^{N}\xi_{\mu,j} \xi_{v,i}\]

signal-to-noise ratio is defined as:

\[\rho = \frac{\mbox{variance of signal}}{\mbox{variance of noise}} = \frac{1}{(M-1)/N} \approx \frac{N}{M}\]

load parameter $\alpha$ is the reciprocal of $\rho$ . $\alpha$ should be less than 0.14.

For almost error-free performance, storage capacity is

\[M_c = \frac{N}{2\log_e N}\]

Thus the storage capacity of Hopfield network scales less than linearly with size N of the network. It’s a major limitation of the Hopfield model.

Cohen-Grossberg Theorem

Cohen and Grossberg (1983) showed how to access the stability of a certain class of neural networks.

\[\frac{d}{dt} \mu_j = a_j(\mu_j) \left[b_j(\mu_j)- \sum_{i=1}^{N}c_{ji}\varphi_i(\mu_i)\right], j= 1,2,…,N\]

Neural network with above dynamics admits a Lyapunov function defined as:

\[E = \frac{1}{2}\sum_{i=1}^{N}\sum_{j=1}^{N}c_{ji}\varphi_i(\mu_i)\varphi_j(\mu_j)-\sum_{j=1}^{N}\int_0^{\mu_j}b-j(\lambda)\varphi_j’(\lambda)d\lambda\] where \[\varphi_j’(\lambda) = \frac{d}{d\lambda}(\varphi_j(\lambda))\]

Conditions to be met:

Synaptic weights are symmetric
Function $a_j(\mu_j)$ satisfies the condition for nonnegativity
The nonlinear activation function $\varphi_j(\mu_j)$ needs to follow the monotonicity condition: \[\varphi_j’(\mu_j) = \frac{d}{d\mu_j}\varphi_j(\mu_j) \ge 0\]
With the above \[\frac{dE}{dt} \le 0\] ensuring global stability of the system
Hopfield model can be seen as a special case of the Cohen-Grossberg Theorem.

AI 2

Algorithm 17

Amazon 1

Authorization 1

Blog 3

Bootstrap 1

C++ 1

CCpp 5

CSS 2

Cloud 3

Code 1

Crawler 1

DNS 1

Database 17

DeepLearning 1

Design 17

Development 1

Docker 1

English 1

Express 1

GDB 1

Go 3

Google 4

HTML 3

IOS 1

Java 17

Javascript 4

Jekyll 1

Linux 4

MacOS 2

MachineLearning 17

Markdown 4

Mobile 1

MongoDB 2

Multi-threading 3

NAS 1

Network 11

NeuralNetwork 10

Node 1

OS 8

Public-speaking 1

Python 15

RESTful 1

Rails 9

React 1

Redis 1

Ruby 6

Shell 2

Spring 2

System 17

TCP 1

TDD 1

Thread 2

Vim 1

awk 1

git 1

jQuery 1

media 1

network 1

php 1