Skip to main content

Machine Learning

Hello World, This is Saumya, and I am here to help you understand the basics of Machine Learning, what exactly does it mean, what are its types, and how powerful of a tool it can be.

We have all been hearing recently about the term "Artificial Intelligence" recently, and how it will shape our future. Well, Machine Learning is nothing but a minor subfield of the vast field of A.I. Some of you might feel they both are basically the same thing, but in reality, they are not. A.I. is basically a cluster of interconnected fields, which makes it difficult for us to sometimes visualize the difference between them all.

Now then, what is the difference?
By definition, A.I. is basically trying to create a machine that is capable to think the way we humans do and specifically learn from our experiences.
On the other hand, M.L. is computer's way of learning from data and henceforth make decision from the information obtained.

Again, We can say that ML is basically a part of AI, but AI is not only limited to machine learning.
For example, AI can solve optimization problems, but Machine Learning can't. Another example would be the genetic algorithms.

Here's an article about it.

Now,  let's take a detour and understand the roots of M.L.
So, basically, the term "Machine Learning" was coined by Arthur Lee Samuel in 1959, the year he devised his famous Checkers-playing program. He basically recorded every move he made for every state while playing checkers into the machine and taught the machine to learn the most appropriate move for the current state of the game. After some time, the program surpassed his ability and defeated him.

Now, there is a general saying that,
"The program is as smart as it's program".
It certainly proves to be wrong in such cases.

According to Samuel, he defined Machine Learning as "the field of study that gives computers the ability to learn without being explicitly programmed."

A more formal and technical definition states that, "A computer program is said to learn from experience E with respect to some class of tasks T  and performance  measure P, if it's performance at those tasks T, as measured by P, improves with Experience E."

Yeah, I know, it will take a moment to understand it. But it is the best definition you can encounter.

Let's understand it for a moment with some example.
Let's say there is some task, like predicting the age of a person on the basis of the picture.
So now T=Predicting the Age.
Now, we feed some training data to our Machine, of various pictures and their respective age.
So now E=Various pictures and their respective ages.
Now we somehow device an algorithm which basically extracts important features and performs some computations and predicts the age of the person.
So P= the accuracy of our program in predicting the correct age.

So by definition, our machine is said to be learning, if the accuracy(P) of predicting the age(T) correctly, increases with time as our machine get's more and more data(E).

If you can't relate to it, try finding out the E,P,T for some basic Machine Learning examples like spam classification and this Checkers game.

Moving ahead, we can divide the type of Machine Learning algorithms based on the type of Data provided to us. It is mainly divided into two categories, Supervised and Unsupervised Learning. Several new types of Machine Learning algorithms have emerged known as Reinforcement Learning, Deep Learning etc. But basically, Supervised and Unsupervised form the basis of them all.

Now, before we dive in individually into each of those Learning, let's understand the basic difference between them both and some minor terminologies associated with Machine Learning.

While studying machine learning, you'll come across terms like features, labels and parameters. Feature is basically those attributes in the training data, which are more like the characteristics of the training data set.

Suppose we have a dataset of area of the house, no of rooms in it, and their respective prices in USD.
Area (sq. mtrs)

Based on the problem we are given, let's say we are to predict the house prices based on this given data.
So, the labels (usually denoted as y(i)) , would be the prices. That is, basically, we are told to predict the prices, and we are already provided some sample prices, so these sample prices are our labels.

Now, features(X(i,j) or x(i)) can be defined as those characteristics or attributes on which our labels are dependent upon. So, Area and Rooms can be termed as our features.
Hence, for a house with unknown price, we will need it's features (Area, Rooms) to predict it's label.

Parameters can be termed as basically "the weights for the features" on which the labels are dependent upon. They are usually denoted as Theta(θ). You'll understand in brief about them when you learn about those algorithms individually.

In practical scenario, the training dataset size(m) is usually in  range of 10,000-100,000,000 and for each training example, there are 100-10,000 number of features(n). So your training example data set would be of (m × n) size, which is huge.

So, the simple method of distinguishing between the Supervised and Unsupervised learning algorithms is through the labels (y(i)). If labels are present, then it's a task for supervised learning, and if there are no labels provided, then it's an unsupervised learning task.

Formally, Let's define and distinguish Supervised and Unsupervised learning.

Supervised Learning, as said earlier, has labels present to help us train the data. It basically means that we are told, "Hey there! Here is some training data, and here is their respective answers. Now go and find the possible mathematical relationship between the training data and their labels, and use it to deduce the possible solution for future problem."

Whereas, In case of Unsupervised learning, we are told, "Hey there, again! Here is some training data, and we don't have any information about what we are supposed to do. So just find some pattern and information from this data."

Yeah, Unsupervised learning sounds so black-boxed, but it's quite simple to practice.

Let's dwell a bit deeper into them both.

Now, what can we do with Supervised Learning?
So, we are given some features, and some answers relating to those features. We can use this to form and define a mathematical relationship between the input Training Data and already given output Label data. This relationship usually helps in predicting values.
Now what about the value predicted?
The predicted value can either be discrete (price of the house), or let's say Boolean(Yes or No), or if we divide more, it can output a Class(A or B or C).

Predicting Discrete Values is basically done using a type of algorithm called the Regression Algorithm. It basically creates a function F(x(i), θ) to predict discrete values(y) for unknown dataset. This function F(x(i), θ), can be linear(Linear Regression) or polynomial(Polynomial Regression).

Classifying our Data into various classes (True, False) or (A,B,C) is done using the Classification Algorithm. It basically predicts predefined class values(y) for unknown datasets. Classification Algorithm can be implemented through Logistic Regression, Neural Networks, Support Vector Machines, Decision Trees, Naïve Bayes Classification and KNN algorithm. All of them are implemented using different techniques, and we might learn about them in the future on this blog.

That's all for Supervised Learning, however, it is important to understand when it's a classification problem and when it's a regression problem. Most people get confused in differentiating in them both.

Moving onto Unsupervised Learning, it is used basically to extract or recognize patterns in data about which we have no information of regarding it's relationship or how important a feature is in the training set. It can be used to cluster similar data, or in opposite, to find anomalous data. It is also useful  to reduce the dimensionality of our features in the data set.

Grouping similar data sounds just like classification, I know, but the main difference between the classification problem and the clustering problem is that, "In classification, we know the number of class and the classes that we want to group into. We know we that the mail will either be spam or not spam. However in case of clustering, we don't know what the data will be grouped into, or how many classes/clusters to group into?" Clustering is done basically using K-means algorithm, which in my opinion is the simplest of the machine learning algorithms from them all.

Anomalous Data can be identified using anomaly detection, which is done basically using Gaussian Normal Distribution and this type of problems are purely statistics based and easy to solve.

Dimensionality Reduction, on the other hand, includes a lot of linear algebra implementation, and is done using Principal Component Analysis. We find a common plane for two different axis showing similar characteristics. In laymen's term, if we are given both height and width, we will remove both the features and replace them with the area, without even recognizing what we have done.

That's it from this blog, if there are any suggestions, or corrections, feel free to mention in the comment section. Also if you have any doubts, feel free to ask.


  • Machine Learning by Andrew Ng, (among the best MOOCs).
  • Various Educational blogs on


Popular posts from this blog

K-Means Clustering for Image Compression, from scratch.

Hello World, This is Saumya, and I am here to help you understand and implement K-Means Clustering Algorithm from scratch without using any Machine Learning libraries. We will further use this algorithm to compress an image. Here, I will implement this code in Python, but you can implement the algorithm in any other programming language of your choice just by basically developing 4-5 simple functions.

So now, first of all, what exactly is Clustering and in particular K-Means? As discussed in my blog on Machine Learning, Clustering is a type of unsupervised machine learning problem in which, we find clusters of similar data. K-means is the most widely used clustering algorithm. So basically, our task is to find those centers for the clusters around which our data points are associated.

These centres of the Clusters are called centroids(K). Note that, these cluster centroids, may or may not belong to our dataset itself. Since our problem is to choose these points, let's first of al…

Linear Regression from Scratch

Hello World, This is Saumya, and I am here to help you understand and implement Linear Regression from scratch without any libraries. Here, I will implement this code in Python, but you can implement the algorithm in any other programming language of your choice just by basically developing 4-5 simple functions.
So now, first of all, what exactly is Linear Regression? As discussed in my blog on Machine Learning, Linear Regression is used to identify linear relationships between the input features x(i) and the output labels of the training set y(i) and thus form a function F(x(i),θ), which would help in predicting future values.
This function, is called hypothesis and is usually denoted by h(x(i),θ). Note that, x(lowercase) is used to denote a single training example as a whole, where as we use X(i,j) is used to point the jth feature for the ith training example. But confusing?? Let's simplify it!!

As shown, to show the whole feature set for a single example, we use x(1). We can also …