Multivariate linear regression
multiple features
notation:
- n = number of features
- \(x^{(i)}\) = input (features) of training example
- = value of feature j in training example
Hypothesis
hypothesis:
define:
- ;
- ;
- ; (the training examples are stored in X row-wise
- for a case with 3 examples and 2 features:
- the hypothesis is given as
gradient descent for multiple variables
parameters: , a n+1 dimensional vector
cost function:
gradient descent:
repeat until convergence {
} (simutaneously update for every j = 0,…, n)
new algorithm:
repeat until convergence {
} (simutaneously update for every j = 0,…, n)
gradient descent in practice - feature scaling
make gradient descent work faster
feature scaling
idea: make sure features are on a similar scale like normalize all features
get every feature into approximately a range
mean normalization
replace with to make features have approximately zero mean (do not apply to )
or use ( is the range or std)
gradient descent in practice - learning rate
“debugging”
how to make sure gradient descent is working correctly?
plot vs no. of iterations
should decrease after every iteration
declare convergence if decreases by less than a small value like in one iteration
if is increasing, try to use smaller
if is too small, convergence will be slow
learning rate
how to choose learning rate
try …, 0.001, 0.003, 0.01, 0.03, 0.1, 0.3, 1,…
features and polynomial regression
if fit the cubic polynomial, create new features :
the can be arbitrary function, like
do the mean normalization and scaling carefully
computing parameters analytically
normal equation
method to solve for analytically
solve for
feature scaling is not required when using normal equation
is a m by n+1 design matrix containing all input vectors; is the m by 1 vector to predict
is called Pseudoinverse of
deduction
derivative of matrix
derivative:
matrix calculus wiki
trace: trace wiki
compare with gradient descent
gradient descent | normal equation |
---|---|
need to choose | no need to choose |
need many iterations | don’t need to iterate |
works well even when n is large | compute n*n matrix |
N/A | slow if n is very large |
n>10000 | smaller n |
normal equation noninvertibility
what if is non-invertible? (singular)
in Octave, two function do the inversion: inv()
and pinv()
; pinv()
will solve the problem
reason of noninvertibility:
- redundant features(linearly dependent)
- like size in square meter and size in square feet
- too many features(e.g. )
- delete some features or use regularization