Performance Metrics

Sonu Ranjit Jacob
3 min readJul 9, 2020

Performance metrics are what we use to judge the performance of our model. There are various performance metrics — accuracy, precision, recall, mean absolute error, mean average error etc.

It is extremely important to know which kind of metric to use — else we will be wrongly judging our model.

In case of classification, using accuracy as a metric makes sense as we can calculate how many datapoints were correctly and incorrectly classified. Suppose I have a model that is required to predict if a given object is an apple or a pear. I will take identifying the apple as my true positive case. Then the confusion matrix is given by

The confusion matrix is a matrix that given an overview of the classified points. Thus accuracy would be given by

We can also calculate other metrics like precision, recall, true positive rate, false positive rate, ROC curve etc from the obtained confusion matrix. Scikit learn gives an excellent explanation of the same here.

Precision refers to how accurate the model is.

Recall refers to the fraction of relevant instances.

In case of medical applications, usually recall is the most important. The confusion matrix for a model that classifies people according to a disease is detected or not is shown below:

Here it is more important to identify all people having the disease as opposed to mistakenly telling someone he has the disease when he actually does not.

There is also another metric called f1-score which is used to weigh both precision and recall equally. A detailed explanation of the same can be found here.

However, when it comes to regression, we have real valued output. In this case we cannot predict the exact value. Consider the example of a model that predicts the cost of the house based on its area (link to example) If we draw a confusion matrix, it would look like this:

Confusion matrix for a model that predicts the price of a house

Hence, in regression problems it makes no sense to use accuracy or a confusion matrix to evaluate performance as we will calculate it against each house or in general against each datapoint.

Instead, in regression problems using mean average error or mean absolute error is a more effective way of judging the performance of the model.

In metrics that use the error, we calculate the difference between the actual and predicted values. If the error is small enough say less than 1, then we judge our model to be sufficiently accurate.

The reason for taking the absolute or the squared value is to account for those cases where the datapoints may be of opposite values and get cancelled out.

So make sure to identify the type of machine learning problem and select the appropriate metric when designing your model!

Happy learning.

References:

  1. https://towardsdatascience.com/understanding-confusion-matrix-a9ad42dcfd62
  2. https://simonhessner.de/why-are-precision-recall-and-f1-score-equal-when-using-micro-averaging-in-a-multi-class-problem/

--

--