Learning higher level abstraction/representations from data
Motivation: how the brain represents and processes sensory information in a hierarchical manner.
Deep learning is based on neural networks.
- Complex models with large number of parameters
- Hierarchical representations
- More params = more accurate on training data
- Simple learning rule for training (gradient)
- Lots of data
- Needed to get better generalization performance
- High-dim input need exponentially many inputs (curse of dimensionality)
- Lots of computing power: GPGPU
- Time consuming
History
Long history
- Fukushima’s Neocognitron (1980)
- LeCun’s Convolutional neural networks (1989)
- Schmidhuber’s work on stacked recurrent neural networks (1993). Vanishing gradient problem
Rise
- Geoffrey Hinton (2006)
- Andrew Ng & Jeff Dean (2012)
- Schmidhuber, Recurrent neural network using LSTM (2011-)
- Google Deepmind (2015,2016)
- ICLR, meeting from 2013
- Textbook (2016)
Current trend
- Deep belief networks (based on Boltzmann machine), Hinton
- Convolutional neural networks, LeCun
- Deep Q-learning Network (extension to reinforcement learning)
- Deep recurrent neural network using LSTM
- Representation learning
- Reinforcement learning
- Extended memory
Boltzmann machine
Given a test data, return the closet data in training
Deep belief net
Deep belief net is layer-by-layer training using RBM
- Overcome issues with logistic belief net, Hinton
- Based on Restricted Boltzmann Machine, but no within-layer connections
- RBM back-and forth update: update hidden given visible, then update visible given hidden
Training
- Train RBM based on input to form hidden representation
- Use hidden representation as input to train another RBM
- Repeat steps 2-3
Deep convolutional neural networks
- Stride of n (sliding window by n pixels)
- Convolution layer (kernels)
- Max pooling
Deep Q-Network
Google Deepmind, Atari2600
- Input: video screen
- Output: (Q(s,a))
- Action-value function
- Value of taking action (a) when in state (s)
- Reward: game score
Deep recurrent neural networks
- Feedforward: No memory of past input
- Recurrent:
- Good: Past input affects present output
- Bad: Cannot remember far into the past
Backprop in time:
- Can unfold recurrent loop: make it into a feedforward net
- Use the same backprop algorithm for training
- Cannot remember too far into the past
Long Short-Term Memory
- LSTM to the rescue (Schmidhuber, 2017)
- Built-in recurrent memory that can be written (Input gate), reset (Forget gate), and outputted (Output gate)
- Long-term retention possible with LSTM
- Unfold in time and use backprop as usual
- Application: Sequence classification, sequence translation, sequence prediction