evaluating a learning algorithm

deciding what to try next

debugging a learning algorithm

Unacceptably large errors in its predictions.

get more training examples - fix high variance
try smaller sets of features - fix high variance
try getting addtional features - fix high bias
try adding polynomial features - fix high bias
try decreasing $\lambda$ - fix high bias
try increasing $\lambda$ - fix high variance

diagnostic

A test that you can run to gain insight what is/isn’t working with a learning algorithm, and gain guidance as to how best to improve its performance.

Diagnostics can take time to implement, but doing so can be a very good use of time.

evaluating a hypothesis

training set: 70% (better randomly shuffled)
test set: 30%

training/testing procedure for linear regression

learning parameter $\theta$ from training data (minimizing training error $J(\theta)$ )
compute test set error (cost function)
get missclassification error (percentage of wrong predictions if classification problem)

model selection and training/validation/test sets

model selection

d: what degree of polynomial to choose for hypothesis

calculate the test set error for different degrees of polynomial cnd choose the one with minimum error

evaluating hypothesis

training set: 60%
cross validation set: 20%
test set: 20%

Test parameters with different degree of polynomial on cross validation set. Estimate generalization error on the test set

bias vs. variance

diagnosing bias vs. variance

High bias: underfit. high training error and high validation error
High variance: overfit. low training error and much high validation error

regularization and bias/variance

regularization parameter $\lambda$

large $\lambda$ : high bias (underfit)
small $\lambda$ : high variance (overfit)

Define the cost function of training, validation and test sets without regularization terms.

Create a list of $\lambda$
(i.e. $\lambda \in \{0,0.01,0.02,0.04,0.08,0.16,0.32,0.64,1.28,2.56,5.12,10.24\}$ );
Create a set of models with different degrees or any other variants.
Iterate through the $\lambda$ s and for each $\lambda$ go through all the models to learn some $\theta$ .
Learn the parameter $\theta$ for the model selected, using $J_{train}(\theta)$ with the $\lambda$ selected.
Compute the train error using the learned $\theta$ (computed with $\lambda$ ) on the $J_{train}(\theta)$ without regularization or $\lambda=0$ .
Compute the cross validation error using the learned $\theta$ (computed with $\lambda$ ) on the $J_{CV}(\theta)$ without regularization or $\lambda=0$ .
Select the best combo that produces the lowest error on the cross validation set.
Using the best combo $\theta$ and $\lambda$ , apply it on $J_{test}(\theta)$ to see if it has a good generalization of the problem.

learning curve

Plot $J_{train}(\theta)$ or $J_{CV}(\theta)$ vs training set size m.
While m increases:

increasing $J_{train}(\theta)$
decreasing $J_{CV}(\theta)$

While bias is high:

The final errors for both training and validation will be high and similar
Getting more training data will not help much

While variance is high:

large gap between final errors for training and validation sets but approach each other while m increases
getting more training data is likely to help

neural network

small neural network: fewer parameters; more prone to underfitting; computationally cheaper
large neural network: use regularization to address overfitting

predictActual

precision = (true positives)/(no. of predicted positive)
recall = (true positives)/(no. of actual positive)

use F1 score formula to evaluate algorithms: $2 \frac{PR}{P+R}$

AI 2

Algorithm 17

Amazon 1

Authorization 1

Blog 3

Bootstrap 1

C++ 1

CCpp 5

CSS 2

Cloud 3

Code 1

Crawler 1

DNS 1

Database 17

DeepLearning 1

Design 17

Development 1

Docker 1

English 1

Express 1

GDB 1

Go 3

Google 4

HTML 3

IOS 1

Java 17

Javascript 4

Jekyll 1

Linux 4

MacOS 2

MachineLearning 17

Markdown 4

Mobile 1

MongoDB 2

Multi-threading 3

NAS 1

Network 11

NeuralNetwork 10

Node 1

OS 8

Public-speaking 1

Python 15

RESTful 1

Rails 9

React 1

Redis 1

Ruby 6

Shell 2

Spring 2

System 17

TCP 1

TDD 1

Thread 2

Vim 1

awk 1

git 1

jQuery 1

media 1

network 1

php 1

Stanford ML 6 Evaluate Algorithms