Hello World,
This is Saumya, and I am here to help you understand and implement Linear
Regression from scratch without any libraries. Here, I will implement this code
in Python, but you can implement the algorithm in any other programming
language of your choice just by basically developing 4-5 simple functions.
So now, first of
all, what exactly is Linear Regression?
As discussed in
my blog on Machine
Learning, Linear Regression is used to identify linear relationships
between the input features x(i)
and the output labels of the training set y(i) and thus form a
function F(x(i),θ), which would help in predicting future values.
This function, is called hypothesis
and is usually denoted by h(x(i),θ). Note that, x(lowercase) is used to denote a single training
example as a whole, where as we use X (i,j) is
used to point the jth feature for the ith training
example. But confusing?? Let's simplify it!!
As shown, to show the whole feature set for a single example, we use x(1). We can also point to the first feature in third example in training set as X(3,1) or x1(3).
For simplicity of understanding,
let's assume, there is only one feature in our dataset of 10 examples. Let's
plot it on a graph.
Or for simplicity, let's say we
have one more feature x0 which is always equal to 1 so that we can
rewrite hθ (x(i))= θ0 x0+θ1 x1.
Or for multiple features, we can
write it as h(x(i),θ)= θ0 x0+ θ1 x1+ θ2 x2+ θ3x3…… θn xn for n number of
features.
Therefore, in short, hθ(x(i))= i=1n∑
(θi*xi).
Now, the code for our hypothesis
will be as follows:
def HypoThe(Theta,xi): if(len(Theta)==len(xi)): sum=0 for i in range(len(xi)): sum+=(Theta[i]*xi[i]) return sum else: return False
Now, since we have defined a
mathematical model for prediction, the next problem we face will be finding the
rest set of parameters Theta(θ).
To find Theta, let's break it
into basic steps:
1. Assume a pair of Theta.
2. Compute the prediction error.
3. Assume again so that it
minimizes the Prediction Error.
Sounds complicated? Well it's
quite simple actually. Suppose we have selected some particular values for parameter
θ. Now, we will devise a new cost function, which basically calculates the
prediction error for θ.
J(θ)= i=1m∑( hθ(x(i))-y(i))2/ (2*m)
What is means is that, we predict
the output using the hθ(x(i)) function, and subtract
it from actual label y(i), and then square it and add it for all
training examples(1….m). We average it out using (2*m). Why did we use 2*m
instead of m, we'll find it out soon. Let’s just say, it's for the simplicity,
and also because we are dealing with a mean-squared computed error function
which we are supposed to minimize.
Although, cost function is useful
only for debugging purposes, and not in actual implementation, we'll still code
it out.
def RegCostFunc(Theta,X,Y): sum1=0 for i in range(len(X)): sum1+=((HypoThe(Theta, X[i])-Y[i])**2) J=sum1 return J/(2*len(X))
Now, what we are supposed to do
next is find those value for Theta for which J(θ) reaches it minimum. If you are
familiar with calculus, what we're supposed to do is to find a the minima for
the J(θ) function. To solve this optimization problem, we'll be using the
gradient descent algorithm.
So what exactly does it do?
Let's write down the algorithm
first.
repeat until convergence: {
for all values of θ
simultaneously update,
θj= θj - α * (Ə J(θ)/ Əθj)
}
Let's understand this equation
first.
(Ə J(θ)/ Əθj) means
that take partial derivative of J(θ) w.r.t θj while rest of the θ
are constant.
Therefore
Ə J(θ)/ Əθj= Ə ( i=1m∑(
hθ(x(i))-y(i))2/
(2*m) ) / Əθj.
If you solve this using calculus,
we find that,
Ə J(θ)/ Əθj= i=1m∑(
hθ(x(i))-y(i))2
* xj(i)/ m
This basically means, while
keeping the rest constant, decrease the value of θj in the direction
of the slope of the function J(θ).
α is the learning rate by which,
we reduce the value of the θj. If α is too large, the function will
never converge actually.
Note that, do not update θj continuously,
but update the whole θ simultaneously.
i.e. after one complete loop.
Rewriting the Algorithm, we get.
repeat until convergence: {
for all values of θ
simultaneously update,
θj= θj - α * ( i=1m∑(
hθ(x(i))-y(i))2 *
xj(i)/ m )
}
Let's code this now,
def GradTerm(X,Y,Theta,i): sum1=0 for j in range(len(X)): sum1+=((HypoThe(Theta,X[j])-Y[j])*X[j][i]) return sum1
Note that, here, I have divided
the Gradient Descent Algorithm from the Gradient Term above, for the simplicity
of the understanding of our code.
def GradDesc(Theta,alpha,Xfeature,Ylabels): Theta_=[] for i in range(0,len(Theta)): Theta_.append(Theta(i)-(alpha* GradTerm(Xfeature,Ylabels,Theta,i)/len(Xfeature))) return Theta_
Now we are done with everything
required. So how to Run all these functions for proper functioning?
Basically, here is how we'll
proceed:
1. We'll call linear regression
and provide it with a Features and Labels.
2. We'll add feature x0 =
1 to every example in data set.
3. We'll initialize
Theta=[0]*len(Features[0]). (i.e. number
of parameters should be equal to number of features )
4. We'll select a particular
value of alpha and no. of iterations,
and run gradient descent algorithm to minimize the parameters θ.
Make sure to keep alpha as small
as possible in the order of 0.000001 or less and the number of iterations to be
around 100,000-1,000,000 for better results.
def LinearRegression(Xfeature,Ylabels,alpha,iterations): if len(Xfeature)!=len(Ylabels): print("Missing Data"); return False else: for i in range(len(Xfeature)): Xfeature[i].insert(0,1) Theta=[0]*len(Xfeature[0]) for i in range(iterations): print("\nIteration Number ",i) print(Theta) Theta=GradDesc(Theta, alpha, Xfeature, Ylabels) print(Theta) return Theta
Now, I know you can easily
implement it using scikit-learn library, but then, it's just a black-box module
for machine learning. You'll never understand what is happening inside of it. However, if you have studied machine learning in detail, once, you'll have
better understanding in using the sci-kit library and tweak the values
accordingly for better results.
Below is the Complete code for
this program. make sure to import math and sys for it. I am not a professional
at coding, you might find some unwanted piece of lines in the code, which you
can optimize yourself though.
That's it from this blog, if
there are any suggestions, or corrections, feel free to mention in the comment
section. Also if you have any doubts, feel free to ask.
References:-
- Machine Learning by Andrew Ng,
Coursera.org (among the best MOOCs).
Such an awesome post dude!! Thanks!! :D
ReplyDelete