It seems to me, that judging a prediction model by graphing the results is akin to judging a book by its cover.

Random forest was used to fit the regression model, which then was used for prediction.

The predicted and actual values were plotted on the same plot in order to visualize the prediction error.

Here are two sets of plots of the predicted  values (in red) and actual values (in blue.)

First Set:  The predicted and actual values were plotted in the order they appeared in the data frame.

PredictedAndActualBadPersp1PredictedAndActualBadPersp2

Conclusion about the model based on the first set of graphs :  the model is very accurate.  The error (difference between the actual and predicted value) seems small.   The overall hue of the graph is purple due to the large degree of overlap of the blue and red plots.

Second Set:  The data frame was sorted according to  the size of the actual values.   The now sorted actual values and the corresponding predictions were plotted.

PredictedAndActualGoodPersp1 PredictedAndActualGoodPersp2

Conclusion about the model based on the second set of graphs :  the model is very inaccurate.  The predicted values seem to be randomly distributed about the actual values.

 

Overall conclusion:  Use something other than plotting the results to determine the accuracy of the predictive model.

Judging a book by its cover

Leave a Reply

Your email address will not be published. Required fields are marked *