2017-01-04

# neural networks

If the original features is a lot, the combined quadratic or cubic features will be much more. It’s computationally expensive.

Origins: algorithms trying to mimic the brains

## Neuron in the brain

1. Dendrite: input wires
2. Cell body with nucleus: do some calculation
3. Axon: output wires

Use logistic units to represent neurons
Sigmoid activation function $$g(z)=\frac{1}{1+e^{-z}}$$

introduce many layers, each of them receive input from previous layer and output to the next layer. The intermediate layers are called hidden layers.

$$a_i^{(j)}$$ = activation of unit i in layer j
$$\Theta^{(j)}$$ = matrix of weights controlling function mapping from layer j to layer j+1. No. rows is No. units in layer j+1; No. cols is 1 + No. units in layer j

Add $$a_0^{(j)}=1$$ as a bias

## vectorized form

It’s called forward propagation

Neural Network learning its own features

$$a^{(1)} = [a_0^{(1)}, a_1^{(1)}, a_2^{(1)}, a_3^{(1)}]^T$$; ($$a_0^{(1)} = 1$$ is added)
$$z^{(2)}=[z_1^{(2)},z_2^{(2)},z_3^{(2)}]^T$$

Architectures: how neurons connect with each other

$$z^{(2)}=\Theta^{(1)}a^{(1)}$$
$$a^{(2)}=g(z^{(2)})$$

Add $$a_0^{(2)}=1$$
$$z^{(3)}=\Theta^{(2)}a^{(2)}$$
$$h_{\Theta}(x)=a^{(3)}=g(z^{(3)})$$

# applications

## example XOR/XNOR

XNOR gives 1 if $$x_1$$ and $$x_2$$ are both 0 or both 1.

### simple example AND

$$x_1, x_2 \in \left\{ 0,1 \right\}$$
$$y = x_1 \text{ AND } x_2$$

$$g(4) \approx 0.99; g(-4) \approx 0.01$$

Select parameters $$\Theta$$ so that when $$x_1 = 1; x_2 = 1$$, $$h_{\Theta}(x) = 1$$; otherwise, 0.

Have units computing $$x_1 \text{ AND } x_2$$, $$(\text{ NOT }x_1) \text{ AND } (\text{ NOT } x_2)$$, $$x_1 \text{ OR } x_2$$; can be put together to compute $$x_1 \text{ XNOR } x_2$$.
Use 3 layers, the first set of parameters are AND & NOT AND NOT. The second is OR.

$$\Theta$$ value for different logistic units:
AND: $$\Theta^{(1)}=[-30, 20, 20]$$
NOR: $$\Theta^{(1)}=[10, -20, -20]$$
OR: $$\Theta^{(1)}=[-10, 20, 20]$$

## multiclass classification

There can be multi output units of the neural network.

For a case with 4 ouputs, $$y^{(i)}=[1,0,0,0]^T$$ is a output unit.