Public speaking course notes Read "Dynamo, Amazon’s Highly Available Key-value Store" Read "Bigtable, A Distributed Storage System for Structured Data" Read "Streaming Systems" 3, Watermarks Read "Streaming Systems" 1&2, Streaming 101 Read "F1, a distributed SQL database that scales" Read "Zanzibar, Google’s Consistent, Global Authorization System" Read "Spanner, Google's Globally-Distributed Database" Read "Designing Data-intensive applications" 12, The Future of Data Systems IOS development with Swift Read "Designing Data-intensive applications" 10&11, Batch and Stream Processing Read "Designing Data-intensive applications" 9, Consistency and Consensus Read "Designing Data-intensive applications" 8, Distributed System Troubles Read "Designing Data-intensive applications" 7, Transactions Read "Designing Data-intensive applications" 6, Partitioning Read "Designing Data-intensive applications" 5, Replication Read "Designing Data-intensive applications" 3&4, Storage, Retrieval, Encoding Read "Designing Data-intensive applications" 1&2, Foundation of Data Systems Three cases of binary search TAMU Operating System 2 Memory Management TAMU Operating System 1 Introduction Overview in cloud computing 2 TAMU Operating System 7 Virtualization TAMU Operating System 6 File System TAMU Operating System 5 I/O and Disk Management TAMU Operating System 4 Synchronization TAMU Operating System 3 Concurrency and Threading TAMU Computer Networks 5 Data Link Layer TAMU Computer Networks 4 Network Layer TAMU Computer Networks 3 Transport Layer TAMU Computer Networks 2 Application Layer TAMU Computer Networks 1 Introduction Overview in distributed systems and cloud computing 1 A well-optimized Union-Find implementation, in Java A heap implementation supporting deletion TAMU Advanced Algorithms 3, Maximum Bandwidth Path (Dijkstra, MST, Linear) TAMU Advanced Algorithms 2, B+ tree and Segment Intersection TAMU Advanced Algorithms 1, BST, 2-3 Tree and Heap TAMU AI, Searching problems Factorization Machine and Field-aware Factorization Machine for CTR prediction TAMU Neural Network 10 Information-Theoretic Models TAMU Neural Network 9 Principal Component Analysis TAMU Neural Network 8 Neurodynamics TAMU Neural Network 7 Self-Organizing Maps TAMU Neural Network 6 Deep Learning Overview TAMU Neural Network 5 Radial-Basis Function Networks TAMU Neural Network 4 Multi-Layer Perceptrons TAMU Neural Network 3 Single-Layer Perceptrons Princeton Algorithms P1W6 Hash Tables & Symbol Table Applications Stanford ML 11 Application Example Photo OCR Stanford ML 10 Large Scale Machine Learning Stanford ML 9 Anomaly Detection and Recommender Systems Stanford ML 8 Clustering & Principal Component Analysis Princeton Algorithms P1W5 Balanced Search Trees TAMU Neural Network 2 Learning Processes TAMU Neural Network 1 Introduction Stanford ML 7 Support Vector Machine Stanford ML 6 Evaluate Algorithms Princeton Algorithms P1W4 Priority Queues and Symbol Tables Stanford ML 5 Neural Networks Learning Princeton Algorithms P1W3 Mergesort and Quicksort Stanford ML 4 Neural Networks Basics Princeton Algorithms P1W2 Stack and Queue, Basic Sorts Stanford ML 3 Classification Problems Stanford ML 2 Multivariate Regression and Normal Equation Princeton Algorithms P1W1 Union and Find Stanford ML 1 Introduction and Parameter Learning

Stanford ML 1 Introduction and Parameter Learning

2016-12-23

introduction

supervised learning

having the idea that there is a relationship between input and output

regression

continuous value

classification

discreted valued output (0 or 1,2,3,…)

unsupervised learning

Derive structure from data where the effect of the vars are not necessarily known
Derive this structure by clustering the data based on the relationship among vars
No feedback based on the prediction results

  • cocktail party problem
[w,s,v] = svd((repmat(sum(x.*x,1),size(x,1),1).*x)*x');

model and cost function

model representation

m = number of training examples
x = input variable/ features
y = output variable/”target” variable
= one training examples
= the ith training example

procedure

training set learning algorithm h(hypothesis)
x h y
h maps from x’s to y’s

hypothesis

linear regression with one variable.
univariate(one variable) linear regression

can be simplified as
: parameters

cost function

goal: minimize by adjusting and
hypothesis:
cost function(squared error function):
parameters:

is as a convenience for the computation of the gradient descent

parameter learning

gradient descent

have some function
want minimize this function by adjusting

outline

start with some
keep changing to reduce until ending up at a minimum

algorithm

repeat until convergence {
\[\theta_{j}:=\theta_{j}-\alpha\frac{\partial}{\partial\theta_{j}}J(\theta_{0},\theta_{1}) \text{ for } j=0 \text{ and } j=1\]
}
correct: simultaneous update



gradient descent intuition


is the learning rate or the coefficient of length of a step
is the derivative or gradient or slope with respect to

as we approach a local minimum, gradient descent will automatically take smaller steps (the gradient becomes smaller), so no need to decrease over time.

gradient descent for linear regression

apply gradient descent to minimize cost function
gradient descent
repeat until convergence {

}
linear regression


convex function means a bowl-shaped function, having only one local minimum

“Batch” gradient descent

each step of gradient descent uses all m the training examples


Creative Commons License
Melon blog is created by melonskin. This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
© 2016-2025. All rights reserved by melonskin. Powered by Jekyll.