Public speaking course notes Read "Dynamo, Amazon’s Highly Available Key-value Store" Read "Bigtable, A Distributed Storage System for Structured Data" Read "Streaming Systems" 3, Watermarks Read "Streaming Systems" 1&2, Streaming 101 Read "F1, a distributed SQL database that scales" Read "Zanzibar, Google’s Consistent, Global Authorization System" Read "Spanner, Google's Globally-Distributed Database" Read "Designing Data-intensive applications" 12, The Future of Data Systems IOS development with Swift Read "Designing Data-intensive applications" 10&11, Batch and Stream Processing Read "Designing Data-intensive applications" 9, Consistency and Consensus Read "Designing Data-intensive applications" 8, Distributed System Troubles Read "Designing Data-intensive applications" 7, Transactions Read "Designing Data-intensive applications" 6, Partitioning Read "Designing Data-intensive applications" 5, Replication Read "Designing Data-intensive applications" 3&4, Storage, Retrieval, Encoding Read "Designing Data-intensive applications" 1&2, Foundation of Data Systems Three cases of binary search TAMU Operating System 2 Memory Management TAMU Operating System 1 Introduction Overview in cloud computing 2 TAMU Operating System 7 Virtualization TAMU Operating System 6 File System TAMU Operating System 5 I/O and Disk Management TAMU Operating System 4 Synchronization TAMU Operating System 3 Concurrency and Threading TAMU Computer Networks 5 Data Link Layer TAMU Computer Networks 4 Network Layer TAMU Computer Networks 3 Transport Layer TAMU Computer Networks 2 Application Layer TAMU Computer Networks 1 Introduction Overview in distributed systems and cloud computing 1 A well-optimized Union-Find implementation, in Java A heap implementation supporting deletion TAMU Advanced Algorithms 3, Maximum Bandwidth Path (Dijkstra, MST, Linear) TAMU Advanced Algorithms 2, B+ tree and Segment Intersection TAMU Advanced Algorithms 1, BST, 2-3 Tree and Heap TAMU AI, Searching problems Factorization Machine and Field-aware Factorization Machine for CTR prediction TAMU Neural Network 10 Information-Theoretic Models TAMU Neural Network 9 Principal Component Analysis TAMU Neural Network 8 Neurodynamics TAMU Neural Network 7 Self-Organizing Maps TAMU Neural Network 6 Deep Learning Overview TAMU Neural Network 5 Radial-Basis Function Networks TAMU Neural Network 4 Multi-Layer Perceptrons TAMU Neural Network 3 Single-Layer Perceptrons Princeton Algorithms P1W6 Hash Tables & Symbol Table Applications Stanford ML 11 Application Example Photo OCR Stanford ML 10 Large Scale Machine Learning Stanford ML 9 Anomaly Detection and Recommender Systems Stanford ML 8 Clustering & Principal Component Analysis Princeton Algorithms P1W5 Balanced Search Trees TAMU Neural Network 2 Learning Processes TAMU Neural Network 1 Introduction Stanford ML 7 Support Vector Machine Stanford ML 6 Evaluate Algorithms Princeton Algorithms P1W4 Priority Queues and Symbol Tables Stanford ML 5 Neural Networks Learning Princeton Algorithms P1W3 Mergesort and Quicksort Stanford ML 4 Neural Networks Basics Princeton Algorithms P1W2 Stack and Queue, Basic Sorts Stanford ML 3 Classification Problems Stanford ML 2 Multivariate Regression and Normal Equation Princeton Algorithms P1W1 Union and Find Stanford ML 1 Introduction and Parameter Learning

TAMU Neural Network 3 Single-Layer Perceptrons

2017-02-13

Adapative Filter

Training set (input/output pair):
\[\mathcal{T}: {\textbf{x}(i),d(i); i = 1,2,…n}\]

  • Filtering process: generation of output based on the input:
  • Adapative process: automatic adjustment of weights to reduce error:

optimal solution: minimize the cost function with respect to the weight vector.

Steepest Descent

Define gradient vector as .
\[\textbf{w}(n+1)=\textbf{w}(n)-\eta g(n)\]

Newton’s Method

An extension of steepest descent, where the second-order term in Taylor series expansion is used.
Faster than steepest descent
Need to satisfy certain conditions such as the Hessian matrix being positive definite (for an arbitary , )

Finally:
\[\textbf{w} = (X^TX)^{-1}X^T d\]

  • does not need to be square matrix.
  • output is linear
  • no iteration needed

Least-Mean-Square Algorithm

The weight update is done with only one pair.

Good for many small changes.

  • like a low-pass filter
  • simple, model-independent, robust
  • follow stocastical direction of steepest descent
  • slow convergence
  • sensitive to the input correlation matrix’s condition number
  • converge when

Can be improved by adapting a time-varying learning rate

Perceptron

\[v=\sum_{i=1}^{m} w_i x_i + b\]
\[y = \phi(v) = \left\{ \array{ 1 & \text{ if } v \gt 0 \cr 0 & \text{ if } v \le 0 } \right. \]

It’s impossible for a single unit to represent XOR or EQUIV

Perceptron learning rule:
\[w(n+1) =w(n) +\eta (n) e(n) x(n)\]
where error

The weight vector will tilt to fit the new input x. The perpendicular decision boundary will change.

Perceptron Convergence Theorem

\[\frac{n^2\alpha^2}{\vert \vert w_0 \vert \vert^2} \le \vert \vert w(n+1) \vert \vert ^2 \le n\beta \]

Thus, the number of iterations cannot grow beyond a certain , where all inputs will be correctly classified:
\[n_{max} = \frac{\beta \vert \vert w_0 \vert \vert^2}{\alpha^2}\]


Creative Commons License
Melon blog is created by melonskin. This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
© 2016-2024. All rights reserved by melonskin. Powered by Jekyll.