Adapative Filter
Training set (input/output pair):
T:x(i),d(i);i=1,2,…n
x(i)=[x1(i),x2(i),...,xm(i)]T
- Filtering process: generation of output based on the input: y(i)=xT(i)w
- Adapative process: automatic adjustment of weights to reduce error: e(i)=d(i)−y(i)
optimal solution: minimize the cost function E(w) with respect to the weight vector.
Steepest Descent
Define gradient vector ∇E(w) as g.
w(n+1)=w(n)−ηg(n)
Newton’s Method
An extension of steepest descent, where the second-order term in Taylor series expansion is used.
Faster than steepest descent
Need to satisfy certain conditions such as the Hessian matrix ∇2E(w) being positive definite (for an arbitary x, xTHx>0)
Finally:
w=(XTX)−1XTd
- X does not need to be square matrix.
- output is linear
- no iteration needed
Least-Mean-Square Algorithm
The weight update is done with only one (xi,di) pair.
Good for many small changes.
- like a low-pass filter
- simple, model-independent, robust
- follow stocastical direction of steepest descent
- slow convergence
- sensitive to the input correlation matrix’s condition number
- converge when 0<η<2/λmax
Can be improved by adapting a time-varying learning rate
Perceptron
v=m∑i=1wixi+b
y=ϕ(v)={1 if v>00 if v≤0
It’s impossible for a single unit to represent XOR or EQUIV
Perceptron learning rule:
w(n+1)=w(n)+η(n)e(n)x(n)
where error e(n)=d(n)−y(n)
The weight vector will tilt to fit the new input x. The perpendicular decision boundary will change.
Perceptron Convergence Theorem
n2α2||w0||2≤||w(n+1)||2≤nβ
Thus, the number of iterations n cannot grow beyond a certain nmax, where all inputs will be correctly classified:
nmax=β||w0||2α2