Public speaking course notes Read "Dynamo, Amazon’s Highly Available Key-value Store" Read "Bigtable, A Distributed Storage System for Structured Data" Read "Streaming Systems" 3, Watermarks Read "Streaming Systems" 1&2, Streaming 101 Read "F1, a distributed SQL database that scales" Read "Zanzibar, Google’s Consistent, Global Authorization System" Read "Spanner, Google's Globally-Distributed Database" Read "Designing Data-intensive applications" 12, The Future of Data Systems IOS development with Swift Read "Designing Data-intensive applications" 10&11, Batch and Stream Processing Read "Designing Data-intensive applications" 9, Consistency and Consensus Read "Designing Data-intensive applications" 8, Distributed System Troubles Read "Designing Data-intensive applications" 7, Transactions Read "Designing Data-intensive applications" 6, Partitioning Read "Designing Data-intensive applications" 5, Replication Read "Designing Data-intensive applications" 3&4, Storage, Retrieval, Encoding Read "Designing Data-intensive applications" 1&2, Foundation of Data Systems Three cases of binary search TAMU Operating System 2 Memory Management TAMU Operating System 1 Introduction Overview in cloud computing 2 TAMU Operating System 7 Virtualization TAMU Operating System 6 File System TAMU Operating System 5 I/O and Disk Management TAMU Operating System 4 Synchronization TAMU Operating System 3 Concurrency and Threading TAMU Computer Networks 5 Data Link Layer TAMU Computer Networks 4 Network Layer TAMU Computer Networks 3 Transport Layer TAMU Computer Networks 2 Application Layer TAMU Computer Networks 1 Introduction Overview in distributed systems and cloud computing 1 A well-optimized Union-Find implementation, in Java A heap implementation supporting deletion TAMU Advanced Algorithms 3, Maximum Bandwidth Path (Dijkstra, MST, Linear) TAMU Advanced Algorithms 2, B+ tree and Segment Intersection TAMU Advanced Algorithms 1, BST, 2-3 Tree and Heap TAMU AI, Searching problems Factorization Machine and Field-aware Factorization Machine for CTR prediction TAMU Neural Network 10 Information-Theoretic Models TAMU Neural Network 9 Principal Component Analysis TAMU Neural Network 8 Neurodynamics TAMU Neural Network 7 Self-Organizing Maps TAMU Neural Network 6 Deep Learning Overview TAMU Neural Network 5 Radial-Basis Function Networks TAMU Neural Network 4 Multi-Layer Perceptrons TAMU Neural Network 3 Single-Layer Perceptrons Princeton Algorithms P1W6 Hash Tables & Symbol Table Applications Stanford ML 11 Application Example Photo OCR Stanford ML 10 Large Scale Machine Learning Stanford ML 9 Anomaly Detection and Recommender Systems Stanford ML 8 Clustering & Principal Component Analysis Princeton Algorithms P1W5 Balanced Search Trees TAMU Neural Network 2 Learning Processes TAMU Neural Network 1 Introduction Stanford ML 7 Support Vector Machine Stanford ML 6 Evaluate Algorithms Princeton Algorithms P1W4 Priority Queues and Symbol Tables Stanford ML 5 Neural Networks Learning Princeton Algorithms P1W3 Mergesort and Quicksort Stanford ML 4 Neural Networks Basics Princeton Algorithms P1W2 Stack and Queue, Basic Sorts Stanford ML 3 Classification Problems Stanford ML 2 Multivariate Regression and Normal Equation Princeton Algorithms P1W1 Union and Find Stanford ML 1 Introduction and Parameter Learning

Stanford ML 8 Clustering & Principal Component Analysis

2017-02-01

Clustering

Unsupervised Learning Introduction

K-Means Algorithm

Input:

  • K (number of clusters)
  • Training set

(drop convention)

Algorithm

Randomly initialize K cluster centroids

Repeat {
for
index (from 1 to K) of cluster centroid closest to ; minimize
for
average (mean) of points assigned to cluster k
}

optimization objective

= index of cluster (1,2,…,K) to which example is currently assigned
= cluster centroid k ()
= cluster centroid of cluster to which example has been assigned

Optimization objective:
Minimize:
\[J(c^{(1)},…c^{(m)},\mu_1,…,\mu_k)=\frac{1}{m} \sum_{i=1}^{m} \vert \vert x^{(i)}-\mu_{c^{(i)}} \vert \vert^2\]

random initialization

Should have

Randomly pick K training examples

Set equal to these K examples

For (usually 50 - 1000) {
Randomly initialize K-means.
Run K-means. Get .
Compute cost function (distortion)
}

Pick clustering that gave lowest cost

choose the number of clusters

Chosen by hand (human insight) based on the purpose.
Can use elbow method

Motivation

data compression

Reduce data from 2D to 1D
Reduce data from 3D t0 1D; project to a plane

visualization

Principal Component Analysis

problem formulation

For the problem of dimensionality reduction, the most popular method.

Try to find a surface which to project the data so as to minimize the projection error (orthogonal distance between the point and the surface).

Reduce from n-dimension to k-dimension: Find k vectors onto which to project the data, so as to minimize the projection error.

Different from linear regression

Before:
perform mean normalization and feature scaling to have zero mean and comparable ranges.

algorithm

data preprocessing

Training set:
Preprocessing (feature scaling/mean normalization):

Replace each with
Scale features to have comparable range of values

Compute “covariance matrix”:
\[\Sigma = \frac{1}{m} \sum_{i=1}^{n}(x^{(i)})(x^{(i)})^T \]
This matrix has a property called symmetric positive definite
Compute “eigenvectors” of matrix :

%X is m by n matrix  
Sigma = 1/m*X'*X;  
[U,S,V] = svd(Sigma);  
%or, results are the same due to symmetric positive definite  
eig(Sigma)  
%  
Ureduce = U(:,1:k);  
z = Ureduce'*x;  

Take first K columns of U matrix to form
\[z=U_{reduced}^T x\]
z will be a

applying PCA

reconstruction from compressed representation

The approximate value on the projection surface:
\[x_{approx}^{(i)} = U_{reduced} z^{(i)}\]

choosing the number of principal components

Choosing k

Average squared projection error:
Total variation in the data:

Choosing k to be the smallest value so that
\[\frac{\frac{1}{m}\sum_{i=1}^{m} \vert \vert x^{(i)} - x_{approx}^{(i)} \vert \vert^2}{\frac{1}{m}\sum_{i=1}^{m} \vert \vert x^{(i)} \vert \vert^2} \le 0.01\]
99% of variance is retained.

Algorithm:
Try PCA with
Compute
Check if the ratio less than or equal to 0.01

[U,S,V] = svd(Sigma)

The ratio can be computed as, for given k:
\[1-\frac{\sum_{i=1}^{k} S_{ii}}{\sum_{i=1}^{n} S_{ii}} \le 0.01\]

advice for applying PCA

supervised learning speedup

Extract inputs:

Unlabeled data set

New training set:

Mapping should be defined by running PCA only on the training set. This mapping can be applied as well to the cross validation and test sets.

Not a good idea to reduce no. features to prevent overfitting

Try to do without PCA first.


Creative Commons License
Melon blog is created by melonskin. This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
© 2016-2024. All rights reserved by melonskin. Powered by Jekyll.