Machine Learning Ex2  Linear Regression
Andrew Ng has posted introductory machine learning lessons on the OpenClassRoom site. I’ve watched the first set and will here solve Exercise 2.
The exercise is to build a linear regression implementation, I’ll use R.
The point of linear regression is to come up with a mathematical function(model) that represents the data as best as possible, that is done by fitting a straight line to the observed data. This model will then allow us to make predictions on new data.
For example, the data we use here are boys ages and their corresponding heights, so when we get the mathematical model we will be able to guess the boys height from his age.
Data
google.spreadsheet < function (key) {
library(RCurl)
# ssl validation off
ssl.verifypeer < FALSE
tt < getForm("https://spreadsheets.google.com/spreadsheet/pub",
hl ="en_GB",
key = key,
single = "true", gid ="0",
output = "csv",
.opts = list(followlocation = TRUE, verbose = TRUE))
read.csv(textConnection(tt), header = TRUE)
}
# load the data
mydata = google.spreadsheet("0AnypY27pPCJydDB4N3MxM0tENlk3UElnZ013cW1iM3c")
# include ggplot2
library(ggplot2)
ex2plot = ggplot(mydata, aes(x, y)) + geom_point() +
ylab('Height in meters') +
xlab('Age in years')
Theory
The model we will get at the end is a line that fits the data, is defined like so:
Setting \(x_0 = 1\):
That can be summarized by (last is matrix notation):
Matrix representation is useful because has good support in software tools.
Goal is to get the line closest to observed data points as possible, thus we can define a cost function that returns the difference of the real data vs myModel:
where i is each data example we have and m is their total.
With J we now have a metric to check if the hypotheses line is getting closer to data points or not.
Next step is to find the smaller cost as possible from J, and in fact thats exactly what the gradient descent algorithm does: starting with an initial guess it iterates to smaller and smaller values of a given function by following the direction of the derivative:
Applying to our J:
And doing a bit of calculus on derivatives we get:
Where alpha defines the size of steps of the convergence to \(\theta\).
Now lets check if all this math really works.
Implementation  take 1
alpha = 0.07
m = length(mydata$x)
theta = c(0,0)
x = mydata$x
y = mydata$y
delta = function(x,y,th,m) {
sum = 0
for (i in 1:m) {
sum = sum + (((t(th) %*% c(1,x[i]))  y[i]) * c(1,x[i]))
}
return (sum)
}
# 1 iteration
theta  alpha * 1/m * delta(x,y,theta,m)
1
[1] 0.07452802 0.38002167
Implementation  take 2
After having a peek at the Matlab solution, i learned that is possible to replace the sum in the equation with a transpose matrix multiplication(like done with the line equation):
So we can get a full matrix implementation:
alpha = 0.07
m = length(mydata$x)
theta = matrix(c(0,0), nrow=1)
x = matrix(c(rep(1,m), mydata$x), ncol=2)
y = matrix(mydata$y, ncol=1)
delta = function(x,y,th) {
delta = (t(x) %*% ((x %*% t(th))  y))
return(t(delta))
}
# 1 iteration
theta  alpha * 1/m * delta(x,y,theta)
[,1] [,2]
[1,] 0.07452802 0.3800217
The Model
First we run several iterations, until convergence:
for (i in 1:1500) {
theta = theta  alpha * 1/m * delta(x,y,theta)
}
theta
[,1] [,2]
[1,] 0.7501504 0.06388338
And finally we see how well the line(model) fits the data:
ex2plot + geom_abline(intercept=theta[1], slope=theta[2])
References
 MATLAB / R Reference, by David Hiebeler
 Short Math Guide for LaTex(.pdf)
 Matrix multiplier tool

Thanks to Andrew Ng and OpenClassRoom for the great lessons.
comments powered by Disqus