# Machine Learning Ex2 - Linear Regression

Andrew Ng has posted introductory machine learning lessons on the OpenClassRoom site. I’ve watched the first set and will here solve Exercise 2.

The exercise is to build a linear regression implementation, I’ll use R.

The point of linear regression is to come up with a mathematical function(model) that represents the data as best as possible, that is done by fitting a straight line to the observed data. This model will then allow us to make predictions on new data.

For example, the data we use here are boys ages and their corresponding heights, so when we get the mathematical model we will be able to guess the boys height from his age.

## Data

google.spreadsheet <- function (key) {
library(RCurl)
# ssl validation off
ssl.verifypeer <- FALSE

hl ="en_GB",
key = key,
single = "true", gid ="0",
output = "csv",
.opts = list(followlocation = TRUE, verbose = TRUE))

}

# include ggplot2
library(ggplot2)

ex2plot = ggplot(mydata, aes(x, y)) + geom_point() +
ylab('Height in meters') +
xlab('Age in years')


## Theory

The model we will get at the end is a line that fits the data, is defined like so:

Setting $$x_0 = 1$$:

That can be summarized by (last is matrix notation):

Matrix representation is useful because has good support in software tools.

Goal is to get the line closest to observed data points as possible, thus we can define a cost function that returns the difference of the real data vs myModel:

where i is each data example we have and m is their total.

With J we now have a metric to check if the hypotheses line is getting closer to data points or not.

Next step is to find the smaller cost as possible from J, and in fact thats exactly what the gradient descent algorithm does: starting with an initial guess it iterates to smaller and smaller values of a given function by following the direction of the derivative:

Applying to our J:

And doing a bit of calculus on derivatives we get:

Where alpha defines the size of steps of the convergence to $$\theta$$.

Now lets check if all this math really works.

## Implementation - take 1

alpha = 0.07
m = length(mydata$x) theta = c(0,0) x = mydata$x
y = mydata$y delta = function(x,y,th,m) { sum = 0 for (i in 1:m) { sum = sum + (((t(th) %*% c(1,x[i])) - y[i]) * c(1,x[i])) } return (sum) } # 1 iteration theta - alpha * 1/m * delta(x,y,theta,m) 1 [1] 0.07452802 0.38002167  ## Implementation - take 2 After having a peek at the Matlab solution, i learned that is possible to replace the sum in the equation with a transpose matrix multiplication(like done with the line equation): So we can get a full matrix implementation: alpha = 0.07 m = length(mydata$x)
theta = matrix(c(0,0), nrow=1)
x = matrix(c(rep(1,m), mydata$x), ncol=2) y = matrix(mydata$y, ncol=1)
delta = function(x,y,th) {
delta = (t(x) %*% ((x %*% t(th)) - y))
return(t(delta))
}

# 1 iteration
theta - alpha * 1/m * delta(x,y,theta)

[,1]      [,2]
[1,] 0.07452802 0.3800217


## The Model

First we run several iterations, until convergence:

for (i in 1:1500) {
theta = theta - alpha * 1/m * delta(x,y,theta)
}
theta

[,1]       [,2]
[1,] 0.7501504 0.06388338


And finally we see how well the line(model) fits the data:

ex2plot + geom_abline(intercept=theta[1], slope=theta[2])