Difference between RMSE and RMSLE

To evaluate if our regression model predicts correctly you can use metrics like R2, RMSLE, RMSE, MSE, MAE. But what is the difference between normal and logarithmic RMSE?

What are the metrics used for?

When we predict a categorical variable it is easy to know if the predicted value is correct or not, since it must have a concrete value. However, when we want to predict a non-categorical variable, this is complicated because we do not have a specific value. Because of this, there are different metrics to measure the error rate by comparing the difference between the predicted value and the actual value.

Usually, if you want to evaluate the model it is necessary to take into account several metrics since each one represents the error differently. However, why does the same metric exist with logarithmic and normal values?

- Root-mean-squared error (RMSE)

- Root-mean-squared-log error (RMSLE)

Root-mean-squared error - RMSE

The RMSE consists of a metric that calculates the average of the differences between the predicted and actual values and, then calculates the square root so that the measurement is in the target range.

RMSE Metric

RMSE Formula

Root-mean-squared-log error RMSLE

The way to calculate this metric is the same, however, transforming the predicted and real dependent variable into a logarithmic value.

RMSLE Metric

RMSLE Formula

Where are they really different?

The best way to understand each metric is by using some examples

Imagine that we have a simple predictive model, for example, a linear regression that predicts the following values.

Real Value Prediction
2 4
3 6

The metrics for these values would be:

MRSE: 2.5495

MRSLE: 0.5358

Outliers

One difference is the influence that outliers values have on the error. This happens because when the values are transformed to logarithmic, these values are softer and also the error. This is known as robustness.

We will calculate the metrics by adding one outlier observation in the table above.

Real Value Prediction
2 4
3 6
50 100

If we look at the metrics again, we can see that the RMSE is very affected because it has increased a lot due to the new values that have been added.

RMSE: 28.9421

RMSLE: 0.5890

Also, visually this effect on a graph can be understood because the logarithmic representation is not parallel, since, according to its orientation it has one of the sides with a flatter curve, so it penalizes more underestimation than overestimation.

Relative Error

When the error between the actual and predicted value increases in magnitude, the RMSE error also increases the same magnitude. However, in RMSLE it is not carried out.

Real Value Prediction
20 40
30 60

RMSE: 25.4950

RMSLE: 0.6729

Comparing with the initial data, it is checked as in MRSLE the scale of the error is not important

I have shown the advantages that RMSLE has over RMSE, however, when you are using metrics in a model it is important to understand which metrics are the most important because RMSLE is not the best for any model.


Your subscription could not be saved. Please try again.
Your subscription has been successful. Thank you for joining this great data world.

GET OUR NEWSLETTER

You'll get the latest posts delivered to your inbox.