neural networks
If the original features is a lot, the combined quadratic or cubic features will be much more. It’s computationally expensive.
Origins: algorithms trying to mimic the brains
Neuron in the brain
- Dendrite: input wires
- Cell body with nucleus: do some calculation
- Axon: output wires
Use logistic units to represent neurons
Sigmoid activation function \(g(z)=\frac{1}{1+e^{-z}}\)
introduce many layers, each of them receive input from previous layer and output to the next layer. The intermediate layers are called hidden layers.
\(a_i^{(j)}\) = activation of unit i in layer j
\(\Theta^{(j)}\) = matrix of weights controlling function mapping from layer j to layer j+1. No. rows is No. units in layer j+1; No. cols is 1 + No. units in layer j
Add \(a_0^{(j)}=1\) as a bias
vectorized form
It’s called forward propagation
Neural Network learning its own features
\(a^{(1)} = [a_0^{(1)}, a_1^{(1)}, a_2^{(1)}, a_3^{(1)}]^T\); (\(a_0^{(1)} = 1\) is added)
\(z^{(2)}=[z_1^{(2)},z_2^{(2)},z_3^{(2)}]^T\)
Architectures: how neurons connect with each other
\(z^{(2)}=\Theta^{(1)}a^{(1)}\)
\(a^{(2)}=g(z^{(2)})\)
Add \(a_0^{(2)}=1\)
\(z^{(3)}=\Theta^{(2)}a^{(2)}\)
\(h_{\Theta}(x)=a^{(3)}=g(z^{(3)})\)
applications
example XOR/XNOR
XNOR gives 1 if \(x_1\) and \(x_2\) are both 0 or both 1.
simple example AND
\(x_1, x_2 \in \left\{ 0,1 \right\}\)
\(y = x_1 \text{ AND } x_2\)
\(g(4) \approx 0.99; g(-4) \approx 0.01\)
Select parameters \(\Theta\) so that when \(x_1 = 1; x_2 = 1\), \(h_{\Theta}(x) = 1\); otherwise, 0.
Have units computing \(x_1 \text{ AND } x_2\), \((\text{ NOT }x_1) \text{ AND } (\text{ NOT } x_2)\), \(x_1 \text{ OR } x_2\); can be put together to compute \(x_1 \text{ XNOR } x_2\).
Use 3 layers, the first set of parameters are AND
& NOT AND NOT
. The second is OR
.
\(\Theta\) value for different logistic units:
AND: \(\Theta^{(1)}=[-30, 20, 20]\)
NOR: \(\Theta^{(1)}=[10, -20, -20]\)
OR: \(\Theta^{(1)}=[-10, 20, 20]\)
multiclass classification
There can be multi output units of the neural network.
For a case with 4 ouputs, \(y^{(i)}=[1,0,0,0]^T\) is a output unit.