Introduction To Machine Learning!

Sonu Ranjit Jacob
4 min readMay 22, 2020

Machine Learning has been a buzzword for quite some time now and for good reason. From movie recommendations (Netflix) to spam filtering (Gmail) and personalised cancer care (IBM Watson Genomics), machine learning has pervaded every aspect of our lives.

Consider the picture shown. If I ask you to pick an orange, you will easily pick the item in the middle. But how do you know that it is an orange? It is because you have seen a number of oranges before and so you use your experience to make a decision.

Machine learning works in the same way, by training a model on some information we make the machine learn to do something.

Arthur Samuel defined machine learning in 1959 as “the field of study that gives computers the ability to learn without being explicitly programmed.”

Tom Mitchell in the book, ‘Machine Learning’ defines it as “A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E”. [1]

It sounds like we can teach a machine to do something on its own right? That’s precisely what it is but is dependent on something very important — data!

Figure 1: Excel sheet data for cost of a house according to its size

Suppose I have an excel sheet of data of houses where the first row contains the area of the house in square meters and the second row contains the corresponding cost of the house. Now if I plot this graph, I get a straight line.

Figure 2: Plot of area versus cost of house

Using this graph I can predict how much a house with area 3500 square meters could cost by projecting a line. So I draw a line from the x-axis at 3500 perpendicular to the line and project it to the y axis where it meets it at $175,000. So I use the available data to make a prediction on the cost of the house based on the area.

Figure 3: Prediction of cost of house with area 3500 sq. meters

I can write a linear equation for this model and given the area of the house, I can output the predicted cost of the house. Of course, Machine Learning is not as simple as this and usually there is much more information required to make such a prediction. The location of the house, number of bedrooms and bathrooms, availability of facilities like a park or gym nearby, etc also influence the cost of a house but this is the fundamental concept of machine learning. For now I will focus on an overview of machine learning and I will explain more about practical implementations in my coming posts.

Now, there are different types of machine learning. It is divided into three main types:

  1. Supervised learning — In this type of machine learning, given a set of data we know what the correct output of the machine learning model should be. For example, given a set of images of dogs and cats we know what the output of my machine learning model should be if I give it an image of a dog and ask it to classify if it is a dog or a cat.
  2. Unsupervised learning — Given a dataset, we do not know what the output of the model should be. In this case we group similar data together into classes. For example recommender systems where we group users together based on the genre that they watch. Here we initially do not know what genre (output) the user prefers but based on his viewing history, we add him to a particular genre, say horror.
  3. Reinforcement learning — This machine learning is concerned with how to train a model and its actions so as to obtain a the maximum reward. Teaching Pacman to win the game without being caught by ghosts is an example of this.

Machine learning also has a number of domains like

  1. Natural Language Processing — Predictive text
  2. Computer Vision — Autonomous Driving, Face Recognition
  3. Time Series Forecasting — Prediction of stock markets, Sunspot prediction.

Personally, in my opinion domain knowledge is important for each machine learning project and usually people stick to two or three of these domains. So if you decide to pursue machine learning, I suggest you first do one simple project in each domain to figure out what interests you. I will be adding a project (mostly in Natural Language Processing) in coming posts.

That’s all for now, in my next post I will revise the math preliminaries needed for machine learning!

References:

  1. Mitchell, T. M. (1997). Machine Learning. New York: McGraw-Hill.
  2. https://research.netflix.com/research-area/machine-learning
  3. https://www.ibm.com/in-en/marketplace/watson-for-genomics

--

--