To evaluate if our regression model predicts correctly you can use metrics like R2, RMSLE, RMSE, MSE, MAE. But what is the difference between normal and logarithmic RMSE?
What are the metrics used for?
When we predict a categorical variable it is easy to know if the predicted value is correct or not, since it must have a concrete value. However, when we want to predict a non-categorical variable, this is complicated because we do not have a specific value. Because of this, there are different metrics to measure the error rate by comparing the difference between the predicted value and the actual value.
Usually, if you want to evaluate the model it is necessary to take into account several metrics since each one represents the error differently. However, why does the same metric exist with logarithmic and normal values?
- Root-mean-squared error (RMSE)
- Root-mean-squared-log error (RMSLE)
Root-mean-squared error - RMSE
The RMSE consists of a metric that calculates the average of the differences between the predicted and actual values and, then calculates the square root so that the measurement is in the target range.
Root-mean-squared-log error RMSLE
The way to calculate this metric is the same, however, transforming the predicted and real dependent variable into a logarithmic value.
Where are they really different?
The best way to understand each metric is by using some examples.
Imagine that we have a simple predictive model, for example, a linear regression that predicts the following values.
The metrics for these values would be:
One difference is the influence that outliers values have on the error. This happens because when the values are transformed to logarithmic, these values are softer and also the error. This is known as robustness.
We will calculate the metrics by adding one outlier observation in the table above.
If we look at the metrics again, we can see that the RMSE is very affected because it has increased a lot due to the new values that have been added.
Also, visually this effect on a graph can be understood because the logarithmic representation is not parallel, since, according to its orientation it has one of the sides with a flatter curve, so it penalizes more underestimation than overestimation.
When the error between the actual and predicted value increases in magnitude, the RMSE error also increases the same magnitude. However, in RMSLE it is not carried out.
Comparing with the initial data, it is checked as in MRSLE the scale of the error is not important.
I have shown the advantages that RMSLE has over RMSE, however, when you are using metrics in a model it is important to understand which metrics are the most important because RMSLE is not the best for any model.