Learning higher level abstraction/representations from data

Motivation: how the brain represents and processes sensory information in a hierarchical manner.

Deep learning is based on neural networks.

- Complex models with large number of parameters
- Hierarchical representations
- More params = more accurate on training data
- Simple learning rule for training (gradient)

- Lots of data
- Needed to get better generalization performance
- High-dim input need exponentially many inputs (curse of dimensionality)

- Lots of computing power: GPGPU
- Time consuming

## History

### Long history

- Fukushima’s Neocognitron (1980)
- LeCun’s Convolutional neural networks (1989)
- Schmidhuber’s work on stacked recurrent neural networks (1993). Vanishing gradient problem

### Rise

- Geoffrey Hinton (2006)
- Andrew Ng & Jeff Dean (2012)
- Schmidhuber, Recurrent neural network using LSTM (2011-)
- Google Deepmind (2015,2016)
- ICLR, meeting from 2013
- Textbook (2016)

## Current trend

- Deep belief networks (based on Boltzmann machine), Hinton
- Convolutional neural networks, LeCun
- Deep Q-learning Network (extension to reinforcement learning)
- Deep recurrent neural network using LSTM
- Representation learning
- Reinforcement learning
- Extended memory

## Boltzmann machine

Given a test data, return the closet data in training

## Deep belief net

Deep belief net is layer-by-layer training using RBM

- Overcome issues with logistic belief net, Hinton
- Based on Restricted Boltzmann Machine, but no within-layer connections
- RBM back-and forth update: update hidden given visible, then update visible given hidden

### Training

- Train RBM based on input to form hidden representation
- Use hidden representation as input to train another RBM
- Repeat steps 2-3

## Deep convolutional neural networks

- Stride of n (sliding window by n pixels)
- Convolution layer (kernels)
- Max pooling

## Deep Q-Network

Google Deepmind, Atari2600

- Input: video screen
- Output: (Q(s,a))
- Action-value function
- Value of taking action (a) when in state (s)

- Reward: game score

## Deep recurrent neural networks

- Feedforward: No memory of past input
- Recurrent:
- Good: Past input affects present output
- Bad: Cannot remember far into the past

Backprop in time:

- Can unfold recurrent loop: make it into a feedforward net
- Use the same backprop algorithm for training
- Cannot remember too far into the past

## Long Short-Term Memory

- LSTM to the rescue (Schmidhuber, 2017)
- Built-in recurrent memory that can be written (Input gate), reset (Forget gate), and outputted (Output gate)
- Long-term retention possible with LSTM
- Unfold in time and use backprop as usual
- Application: Sequence classification, sequence translation, sequence prediction