Adapative Filter
Training set (input/output pair):
\[\mathcal{T}: {\textbf{x}(i),d(i); i = 1,2,…n}\]
- Filtering process: generation of output based on the input:
- Adapative process: automatic adjustment of weights to reduce error:
optimal solution: minimize the cost function with respect to the weight vector.
Steepest Descent
Define gradient vector as .
\[\textbf{w}(n+1)=\textbf{w}(n)-\eta g(n)\]
Newton’s Method
An extension of steepest descent, where the second-order term in Taylor series expansion is used.
Faster than steepest descent
Need to satisfy certain conditions such as the Hessian matrix being positive definite (for an arbitary , )
Finally:
\[\textbf{w} = (X^TX)^{-1}X^T d\]
- does not need to be square matrix.
- output is linear
- no iteration needed
Least-Mean-Square Algorithm
The weight update is done with only one pair.
Good for many small changes.
- like a low-pass filter
- simple, model-independent, robust
- follow stocastical direction of steepest descent
- slow convergence
- sensitive to the input correlation matrix’s condition number
- converge when
Can be improved by adapting a time-varying learning rate
Perceptron
\[v=\sum_{i=1}^{m} w_i x_i + b\]
\[y = \phi(v) = \left\{ \array{ 1 & \text{ if } v \gt 0 \cr 0 & \text{ if } v \le 0 } \right. \]
It’s impossible for a single unit to represent XOR or EQUIV
Perceptron learning rule:
\[w(n+1) =w(n) +\eta (n) e(n) x(n)\]
where error
The weight vector will tilt to fit the new input x. The perpendicular decision boundary will change.
Perceptron Convergence Theorem
\[\frac{n^2\alpha^2}{\vert \vert w_0 \vert \vert^2} \le \vert \vert w(n+1) \vert \vert ^2 \le n\beta \]
Thus, the number of iterations cannot grow beyond a certain , where all inputs will be correctly classified:
\[n_{max} = \frac{\beta \vert \vert w_0 \vert \vert^2}{\alpha^2}\]