Hello World,
This is Saumya, and I am here to help you understand the basics of Machine Learning, what exactly does it mean, what are its types, and how powerful of a
tool it can be.
We have all been
hearing recently about the term "Artificial Intelligence" recently,
and how it will shape our future. Well, Machine Learning is nothing but a minor
subfield of the vast field of A.I. Some of you might feel they both are
basically the same thing, but in reality, they are not. A.I. is basically a
cluster of interconnected fields, which makes it difficult for us to sometimes
visualize the difference between them all.
Now then, what
is the difference?
By definition,
A.I. is basically trying to create a machine that is capable to think the way
we humans do and specifically learn from our experiences.
On the other hand,
M.L. is computer's way of learning from data and henceforth make decision from
the information obtained.
Again, We can
say that ML is basically a part of AI, but AI is not only limited to machine
learning.
For example, AI
can solve optimization problems, but Machine Learning can't. Another example
would be the genetic algorithms.
Here's an article
about it.
Now, let's take a detour and understand the
roots of M.L.
So, basically,
the term "Machine Learning" was coined by Arthur Lee Samuel in 1959,
the year he devised his famous Checkers-playing program. He basically recorded
every move he made for every state while playing checkers into the machine and
taught the machine to learn the most appropriate move for the current state of
the game. After some time, the program surpassed his ability and defeated him.
Now, there is a
general saying that,
"The
program is as smart as it's program".
It certainly
proves to be wrong in such cases.
According to
Samuel, he defined Machine Learning as "the field of study that gives
computers the ability to learn without being explicitly programmed."
A more formal
and technical definition states that, "A computer program is said to learn
from experience E with respect to some class of tasks T and performance measure P, if it's performance at those tasks
T, as measured by P, improves with Experience E."
Yeah, I know, it will take a moment to understand it. But it is the best definition you can encounter.
Let's understand it for a moment with some example.
Yeah, I know, it will take a moment to understand it. But it is the best definition you can encounter.
Let's understand it for a moment with some example.
Let's say there
is some task, like predicting the age of a person on the basis of the picture.
So now T=Predicting the Age.
So now T=Predicting the Age.
Now, we feed
some training data to our Machine, of various pictures and their respective
age.
So now E=Various
pictures and their respective ages.
Now we somehow
device an algorithm which basically extracts important features and performs
some computations and predicts the age of the person.
So P= the accuracy
of our program in predicting the correct age.
So by
definition, our machine is said to be learning, if the accuracy(P) of
predicting the age(T) correctly, increases with time as our machine get's more and
more data(E).
If you can't
relate to it, try finding out the E,P,T for some basic Machine Learning examples
like spam classification and this Checkers game.
Moving ahead, we
can divide the type of Machine Learning algorithms based on the type of Data
provided to us. It is mainly divided into two categories, Supervised and
Unsupervised Learning. Several new types of Machine Learning algorithms have emerged
known as Reinforcement Learning, Deep Learning etc. But basically, Supervised
and Unsupervised form the basis of them all.
Now, before we
dive in individually into each of those Learning, let's understand the basic
difference between them both and some minor terminologies associated with
Machine Learning.
While studying
machine learning, you'll come across terms like features, labels and parameters.
Feature is basically those attributes in the training data, which are more like
the characteristics of the training data set.
Suppose we have
a dataset of area of the house, no of rooms in it, and their respective prices
in USD.
Area (sq. mtrs)
|
Rooms
|
Price(USD)
|
3890
|
3
|
573900
|
1100
|
3
|
249900
|
1458
|
3
|
464500
|
Based on the
problem we are given, let's say we are to predict the house prices based on
this given data.
So, the labels
(usually denoted as y(i)) , would be the prices. That is, basically,
we are told to predict the prices, and we are already provided some sample
prices, so these sample prices are our labels.
Now, features(X(i,j)
or x(i)) can be defined as those characteristics or attributes on which
our labels are dependent upon. So, Area and Rooms can be termed as our
features.
Hence, for a
house with unknown price, we will need it's features (Area, Rooms) to predict
it's label.
Parameters can
be termed as basically "the weights for the features" on which the
labels are dependent upon. They are usually denoted as Theta(θ). You'll understand in brief about them when you learn about those
algorithms individually.
In practical
scenario, the training dataset size(m) is usually in range of 10,000-100,000,000 and for each
training example, there are 100-10,000 number of features(n). So your training
example data set would be of (m × n) size, which is huge.
So, the simple
method of distinguishing between the Supervised and Unsupervised learning
algorithms is through the labels (y(i)). If labels are present, then
it's a task for supervised learning, and if there are no labels provided, then
it's an unsupervised learning task.
Formally, Let's
define and distinguish Supervised and Unsupervised learning.
Supervised
Learning, as said earlier, has labels present to help us train the data. It
basically means that we are told, "Hey there! Here is some training data,
and here is their respective answers. Now go and find the possible mathematical
relationship between the training data and their labels, and use it to deduce
the possible solution for future problem."
Whereas, In case
of Unsupervised learning, we are told, "Hey there, again! Here is some training
data, and we don't have any information about what we are supposed to do. So
just find some pattern and information from this data."
Yeah, Unsupervised
learning sounds so black-boxed, but it's quite simple to practice.
Let's dwell a
bit deeper into them both.
Now, what can we
do with Supervised Learning?
So, we are given
some features, and some answers relating to those features. We can use this to
form and define a mathematical relationship between the input Training Data and
already given output Label data. This relationship usually helps in predicting
values.
Now what about
the value predicted?
The predicted
value can either be discrete (price of the house), or let's say Boolean(Yes or
No), or if we divide more, it can output a Class(A or B or C).
Predicting
Discrete Values is basically done using a type of algorithm called the
Regression Algorithm. It basically creates a function F(x(i), θ) to predict
discrete values(y) for unknown dataset. This function F(x(i), θ), can be linear(Linear
Regression) or polynomial(Polynomial Regression).
Classifying our Data into various
classes (True, False) or (A,B,C) is done using the Classification Algorithm. It basically predicts predefined class values(y) for unknown datasets. Classification
Algorithm can be implemented through Logistic Regression, Neural Networks,
Support Vector Machines, Decision Trees, Naïve Bayes Classification and KNN
algorithm. All of them are implemented using different techniques, and we might
learn about them in the future on this blog.
That's all for Supervised
Learning, however, it is important to understand when it's a classification
problem and when it's a regression problem. Most people get confused in
differentiating in them both.
Moving onto Unsupervised
Learning, it is used basically to extract or recognize patterns in data about
which we have no information of regarding it's relationship or how important a
feature is in the training set. It can be used to cluster similar data, or in
opposite, to find anomalous data. It is also useful to reduce the dimensionality of our features
in the data set.
Grouping similar data sounds just like classification, I know, but the main difference between the classification problem and the clustering problem is that, "In classification, we know the number of class and the classes that we want to group into. We know we that the mail will either be spam or not spam. However in case of clustering, we don't know what the data will be grouped into, or how many classes/clusters to group into?" Clustering is done basically using K-means algorithm, which in my opinion is the simplest of the machine learning algorithms from them all.
Anomalous Data
can be identified using anomaly detection, which is done basically using
Gaussian Normal Distribution and this type of problems are purely statistics
based and easy to solve.
Dimensionality Reduction, on the other hand, includes a lot of linear algebra implementation, and is done using Principal Component Analysis. We find a common plane for two different axis showing similar characteristics. In laymen's term, if we are given both height and width, we will remove both the features and replace them with the area, without even recognizing what we have done.
Dimensionality Reduction, on the other hand, includes a lot of linear algebra implementation, and is done using Principal Component Analysis. We find a common plane for two different axis showing similar characteristics. In laymen's term, if we are given both height and width, we will remove both the features and replace them with the area, without even recognizing what we have done.
That's it from
this blog, if there are any suggestions, or corrections, feel free to mention
in the comment section. Also if you have any doubts, feel free to ask.
References:-
References:-
- Machine Learning by Andrew Ng, Coursera.org (among the best MOOCs).
- Various Educational blogs on medium.com
Comments
Post a Comment