Public speaking course notes Read "Dynamo, Amazon’s Highly Available Key-value Store" Read "Bigtable, A Distributed Storage System for Structured Data" Read "Streaming Systems" 3, Watermarks Read "Streaming Systems" 1&2, Streaming 101 Read "F1, a distributed SQL database that scales" Read "Zanzibar, Google’s Consistent, Global Authorization System" Read "Spanner, Google's Globally-Distributed Database" Read "Designing Data-intensive applications" 12, The Future of Data Systems IOS development with Swift Read "Designing Data-intensive applications" 10&11, Batch and Stream Processing Read "Designing Data-intensive applications" 9, Consistency and Consensus Read "Designing Data-intensive applications" 8, Distributed System Troubles Read "Designing Data-intensive applications" 7, Transactions Read "Designing Data-intensive applications" 6, Partitioning Read "Designing Data-intensive applications" 5, Replication Read "Designing Data-intensive applications" 3&4, Storage, Retrieval, Encoding Read "Designing Data-intensive applications" 1&2, Foundation of Data Systems Three cases of binary search TAMU Operating System 2 Memory Management TAMU Operating System 1 Introduction Overview in cloud computing 2 TAMU Operating System 7 Virtualization TAMU Operating System 6 File System TAMU Operating System 5 I/O and Disk Management TAMU Operating System 4 Synchronization TAMU Operating System 3 Concurrency and Threading TAMU Computer Networks 5 Data Link Layer TAMU Computer Networks 4 Network Layer TAMU Computer Networks 3 Transport Layer TAMU Computer Networks 2 Application Layer TAMU Computer Networks 1 Introduction Overview in distributed systems and cloud computing 1 A well-optimized Union-Find implementation, in Java A heap implementation supporting deletion TAMU Advanced Algorithms 3, Maximum Bandwidth Path (Dijkstra, MST, Linear) TAMU Advanced Algorithms 2, B+ tree and Segment Intersection TAMU Advanced Algorithms 1, BST, 2-3 Tree and Heap TAMU AI, Searching problems Factorization Machine and Field-aware Factorization Machine for CTR prediction TAMU Neural Network 10 Information-Theoretic Models TAMU Neural Network 9 Principal Component Analysis TAMU Neural Network 8 Neurodynamics TAMU Neural Network 7 Self-Organizing Maps TAMU Neural Network 6 Deep Learning Overview TAMU Neural Network 5 Radial-Basis Function Networks TAMU Neural Network 4 Multi-Layer Perceptrons TAMU Neural Network 3 Single-Layer Perceptrons Princeton Algorithms P1W6 Hash Tables & Symbol Table Applications Stanford ML 11 Application Example Photo OCR Stanford ML 10 Large Scale Machine Learning Stanford ML 9 Anomaly Detection and Recommender Systems Stanford ML 8 Clustering & Principal Component Analysis Princeton Algorithms P1W5 Balanced Search Trees TAMU Neural Network 2 Learning Processes TAMU Neural Network 1 Introduction Stanford ML 7 Support Vector Machine Stanford ML 6 Evaluate Algorithms Princeton Algorithms P1W4 Priority Queues and Symbol Tables Stanford ML 5 Neural Networks Learning Princeton Algorithms P1W3 Mergesort and Quicksort Stanford ML 4 Neural Networks Basics Princeton Algorithms P1W2 Stack and Queue, Basic Sorts Stanford ML 3 Classification Problems Stanford ML 2 Multivariate Regression and Normal Equation Princeton Algorithms P1W1 Union and Find Stanford ML 1 Introduction and Parameter Learning

Stanford ML 4 Neural Networks Basics

2017-01-04

neural networks

If the original features is a lot, the combined quadratic or cubic features will be much more. It’s computationally expensive.

Origins: algorithms trying to mimic the brains

Neuron in the brain

  1. Dendrite: input wires
  2. Cell body with nucleus: do some calculation
  3. Axon: output wires

Use logistic units to represent neurons
Sigmoid activation function \(g(z)=\frac{1}{1+e^{-z}}\)

introduce many layers, each of them receive input from previous layer and output to the next layer. The intermediate layers are called hidden layers.

\(a_i^{(j)}\) = activation of unit i in layer j
\(\Theta^{(j)}\) = matrix of weights controlling function mapping from layer j to layer j+1. No. rows is No. units in layer j+1; No. cols is 1 + No. units in layer j

Add \(a_0^{(j)}=1\) as a bias

vectorized form

It’s called forward propagation

Neural Network learning its own features

\(a^{(1)} = [a_0^{(1)}, a_1^{(1)}, a_2^{(1)}, a_3^{(1)}]^T\); (\(a_0^{(1)} = 1\) is added)
\(z^{(2)}=[z_1^{(2)},z_2^{(2)},z_3^{(2)}]^T\)

Architectures: how neurons connect with each other

\(z^{(2)}=\Theta^{(1)}a^{(1)}\)
\(a^{(2)}=g(z^{(2)})\)

Add \(a_0^{(2)}=1\)
\(z^{(3)}=\Theta^{(2)}a^{(2)}\)
\(h_{\Theta}(x)=a^{(3)}=g(z^{(3)})\)

applications

example XOR/XNOR

XNOR gives 1 if \(x_1\) and \(x_2\) are both 0 or both 1.

simple example AND

\(x_1, x_2 \in \left\{ 0,1 \right\}\)
\(y = x_1 \text{ AND } x_2\)

\(g(4) \approx 0.99; g(-4) \approx 0.01\)

Select parameters \(\Theta\) so that when \(x_1 = 1; x_2 = 1\), \(h_{\Theta}(x) = 1\); otherwise, 0.

Have units computing \(x_1 \text{ AND } x_2\), \((\text{ NOT }x_1) \text{ AND } (\text{ NOT } x_2)\), \(x_1 \text{ OR } x_2\); can be put together to compute \(x_1 \text{ XNOR } x_2\).
Use 3 layers, the first set of parameters are AND & NOT AND NOT. The second is OR.

\(\Theta\) value for different logistic units:
AND: \(\Theta^{(1)}=[-30, 20, 20]\)
NOR: \(\Theta^{(1)}=[10, -20, -20]\)
OR: \(\Theta^{(1)}=[-10, 20, 20]\)

multiclass classification

There can be multi output units of the neural network.

For a case with 4 ouputs, \(y^{(i)}=[1,0,0,0]^T\) is a output unit.


Creative Commons License
Melon blog is created by melonskin. This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
© 2016-2025. All rights reserved by melonskin. Powered by Jekyll.