introduction

supervised learning

having the idea that there is a relationship between input and output

regression

continuous value

classification

discreted valued output (0 or 1,2,3,…)

unsupervised learning

Derive structure from data where the effect of the vars are not necessarily known
Derive this structure by clustering the data based on the relationship among vars
No feedback based on the prediction results

cocktail party problem

[w,s,v] = svd((repmat(sum(x.*x,1),size(x,1),1).*x)*x');

model and cost function

model representation

m = number of training examples
x = input variable/ features
y = output variable/”target” variable
$(x,y)$ = one training examples
$(x^{(i)},y^{(i)})$ = the ith training example

procedure

training set $\rightarrow$ learning algorithm $\rightarrow$ h(hypothesis)
x $\rightarrow$ h $\rightarrow$ y
h maps from x’s to y’s

hypothesis

linear regression with one variable.
univariate(one variable) linear regression
$h_{\theta}(x)={\theta}_0 + {\theta}_1x$
$h_{\theta}(x)$ can be simplified as $h(x)$
${\theta}_{i}'s$ : parameters

cost function

goal: minimize $\frac{1}{2m}\sum_{i=1}^m(h_{\theta}(x^{(i)})-y^{(i)})^2$ by adjusting ${\theta}_0$ and ${\theta}_1$
hypothesis: $h_{\theta}(x)={\theta}_0 + {\theta}_1x$
cost function(squared error function): $J({\theta}_{0},{\theta}_{1})$
parameters: ${\theta}_{i}'s$
$J({\theta}_{0},{\theta}_{1})=\frac{1}{2m}\sum_{i=1}^m(\hat{y}_{i}-y^{(i)})^2=\frac{1}{2m}\sum_{i=1}^m(h_{\theta}(x^{(i)})-y^{(i)})^2$
$\frac{1}{2}$ is as a convenience for the computation of the gradient descent

parameter learning

gradient descent

have some function $J(\theta_{0},\theta_{1})$
want minimize this function $J$ by adjusting $\theta_{0},\theta_{1}$

outline

start with some $\theta_{0},\theta_{1}$
keep changing $\theta_{0},\theta_{1}$ to reduce $J(\theta_{0},\theta_{1})$ until ending up at a minimum

algorithm

repeat until convergence {
\[\theta_{j}:=\theta_{j}-\alpha\frac{\partial}{\partial\theta_{j}}J(\theta_{0},\theta_{1}) \text{ for } j=0 \text{ and } j=1\]
}
correct: simultaneous update
$temp0:=\theta_{0}-\alpha\frac{\partial}{\partial\theta_{0}}J(\theta_{0},\theta_{1})$
$temp1:=\theta_{1}-\alpha\frac{\partial}{\partial\theta_{1}}J(\theta_{1},\theta_{1})$
$\theta_{0} := temp0$
$\theta_{1} := temp1$

gradient descent intuition

$\theta_{j}:=\theta_{j}-\alpha\frac{\partial}{\partial\theta_{j}}J(\theta_{0},\theta_{1}) \text{ simultaneously update } j=0 \text{ and } j=1$
$\alpha$ is the learning rate or the coefficient of length of a step
$\frac{\partial}{\partial\theta_{j}}J(\theta_{0},\theta_{1})$ is the derivative or gradient or slope with respect to $\theta_{j}$

as we approach a local minimum, gradient descent will automatically take smaller steps (the gradient becomes smaller), so no need to decrease $\alpha$ over time.

gradient descent for linear regression

apply gradient descent to minimize cost function $J({\theta}_{0},{\theta}_{1})$
gradient descent
repeat until convergence {
$\theta_{j}:=\theta_{j}-\alpha\frac{\partial}{\partial\theta_{j}}J(\theta_{0},\theta_{1}) \text{ for } j=0 \text{ and } j=1$
}
linear regression
$h_{\theta}(x)={\theta}_0 + {\theta}_1x$
$J({\theta}_{0},{\theta}_{1})=\frac{1}{2m}\sum_{i=1}^m(h_{\theta}(x^{(i)})-y^{(i)})^2$
convex function means a bowl-shaped function, having only one local minimum

“Batch” gradient descent

each step of gradient descent uses all m the training examples

AI 2

Algorithm 17

Amazon 1

Authorization 1

Blog 3

Bootstrap 1

C++ 1

CCpp 5

CSS 2

Cloud 3

Code 1

Crawler 1

DNS 1

Database 17

DeepLearning 1

Design 17

Development 1

Docker 1

English 1

Express 1

GDB 1

Go 3

Google 4

HTML 3

IOS 1

Java 17

Javascript 4

Jekyll 1

Linux 4

MacOS 2

MachineLearning 17

Markdown 4

Mobile 1

MongoDB 2

Multi-threading 3

NAS 1

Network 11

NeuralNetwork 10

Node 1

OS 8

Public-speaking 1

Python 15

RESTful 1

Rails 9

React 1

Redis 1

Ruby 6

Shell 2

Spring 2

System 17

TCP 1

TDD 1

Thread 2

Vim 1

awk 1

git 1

jQuery 1

media 1

network 1

php 1

Stanford ML 1 Introduction and Parameter Learning