neural networks

If the original features is a lot, the combined quadratic or cubic features will be much more. It’s computationally expensive.

Origins: algorithms trying to mimic the brains

Neuron in the brain

Dendrite: input wires
Cell body with nucleus: do some calculation
Axon: output wires

Use logistic units to represent neurons
Sigmoid activation function $g(z)=\frac{1}{1+e^{-z}}$

introduce many layers, each of them receive input from previous layer and output to the next layer. The intermediate layers are called hidden layers.

$a_i^{(j)}$ = activation of unit i in layer j
$\Theta^{(j)}$ = matrix of weights controlling function mapping from layer j to layer j+1. No. rows is No. units in layer j+1; No. cols is 1 + No. units in layer j

Add $a_0^{(j)}=1$ as a bias

vectorized form

It’s called forward propagation

Neural Network learning its own features

$a^{(1)} = [a_0^{(1)}, a_1^{(1)}, a_2^{(1)}, a_3^{(1)}]^T$; ($a_0^{(1)} = 1$ is added)
$z^{(2)}=[z_1^{(2)},z_2^{(2)},z_3^{(2)}]^T$

Architectures: how neurons connect with each other

$z^{(2)}=\Theta^{(1)}a^{(1)}$
$a^{(2)}=g(z^{(2)})$

Add $a_0^{(2)}=1$
$z^{(3)}=\Theta^{(2)}a^{(2)}$
$h_{\Theta}(x)=a^{(3)}=g(z^{(3)})$

applications

example XOR/XNOR

XNOR gives 1 if $x_1$ and $x_2$ are both 0 or both 1.

simple example AND

$x_1, x_2 \in \left\{ 0,1 \right\}$
$y = x_1 \text{ AND } x_2$

$g(4) \approx 0.99; g(-4) \approx 0.01$

Select parameters $\Theta$ so that when $x_1 = 1; x_2 = 1$, $h_{\Theta}(x) = 1$; otherwise, 0.

Have units computing $x_1 \text{ AND } x_2$, $(\text{ NOT }x_1) \text{ AND } (\text{ NOT } x_2)$, $x_1 \text{ OR } x_2$; can be put together to compute $x_1 \text{ XNOR } x_2$.
Use 3 layers, the first set of parameters are AND & NOT AND NOT. The second is OR.

$\Theta$ value for different logistic units:
AND: $\Theta^{(1)}=[-30, 20, 20]$
NOR: $\Theta^{(1)}=[10, -20, -20]$
OR: $\Theta^{(1)}=[-10, 20, 20]$

multiclass classification

There can be multi output units of the neural network.

For a case with 4 ouputs, $y^{(i)}=[1,0,0,0]^T$ is a output unit.

$\left[ \begin{array}{c} x_0 \\\\ x_1 \\\\ x_2 \\\\ ... \\\\ x_n \end{array} \right] \rightarrow \left[ \begin{array}{c} a_0^{(2)} \\\\ a_1^{(2)} \\\\ a_2^{(2)} \\\\ ... \\\\ a_n^{(2)} \end{array} \right] \rightarrow \left[ \begin{array}{c} a_0^{(3)} \\\\ a_1^{(3)} \\\\ a_2^{(3)} \\\\ ... \\\\ a_n^{(3)} \end{array} \right] \rightarrow ... \rightarrow \left[ \begin{array}{c} h_{\Theta}(x)_1 \\\\ h_{\Theta}(x)_2 \\\\ h_{\Theta}(x)_3 \\\\ h_{\Theta}(x)_4 \end{array} \right] = y$

AI 2

Algorithm 17

Amazon 1

Authorization 1

Blog 3

Bootstrap 1

C++ 1

CCpp 5

CSS 2

Cloud 3

Code 1

Crawler 1

DNS 1

Database 17

DeepLearning 1

Design 17

Development 1

Docker 1

English 1

Express 1

GDB 1

Go 3

Google 4

HTML 3

IOS 1

Java 17

Javascript 4

Jekyll 1

Linux 4

MacOS 2

MachineLearning 17

Markdown 4

Mobile 1

MongoDB 2

Multi-threading 3

NAS 1

Network 11

NeuralNetwork 10

Node 1

OS 8

Public-speaking 1

Python 15

RESTful 1

Rails 9

React 1

Redis 1

Ruby 6

Shell 2

Spring 2

System 17

TCP 1

TDD 1

Thread 2

Vim 1

awk 1

git 1

jQuery 1

media 1

network 1

php 1

Stanford ML 4 Neural Networks Basics