Adapative Filter
Training set (input/output pair):  
\[\mathcal{T}: {\textbf{x}(i),d(i); i = 1,2,…n}\]  
- Filtering process: generation of output based on the input:
 - Adapative process: automatic adjustment of weights to reduce error:
 
optimal solution: minimize the cost function with respect to the weight vector.
Steepest Descent
Define gradient vector  as .  
\[\textbf{w}(n+1)=\textbf{w}(n)-\eta g(n)\]
Newton’s Method
An extension of steepest descent, where the second-order term in Taylor series expansion is used.  
Faster than steepest descent  
Need to satisfy certain conditions such as the Hessian matrix  being positive definite (for an arbitary , )
Finally:  
\[\textbf{w} = (X^TX)^{-1}X^T d\]
- does not need to be square matrix.
 - output is linear
 - no iteration needed
 
Least-Mean-Square Algorithm
The weight update is done with only one pair.
Good for many small changes.
- like a low-pass filter
 - simple, model-independent, robust
 - follow stocastical direction of steepest descent
 - slow convergence
 - sensitive to the input correlation matrix’s condition number
 - converge when
 
Can be improved by adapting a time-varying learning rate
Perceptron
\[v=\sum_{i=1}^{m} w_i x_i + b\]  
\[y = \phi(v) = \left\{ \array{ 1 & \text{ if } v \gt 0 \cr 0 & \text{ if } v \le 0 } \right. \]
It’s impossible for a single unit to represent XOR or EQUIV
Perceptron learning rule:  
\[w(n+1) =w(n) +\eta (n) e(n) x(n)\]  
where error 
The weight vector will tilt to fit the new input x. The perpendicular decision boundary will change.
Perceptron Convergence Theorem
\[\frac{n^2\alpha^2}{\vert \vert w_0 \vert \vert^2} \le \vert \vert w(n+1) \vert \vert ^2 \le n\beta \]
Thus, the number of iterations  cannot grow beyond a certain , where all inputs will be correctly classified:  
\[n_{max} = \frac{\beta \vert \vert w_0 \vert \vert^2}{\alpha^2}\]
