Why are my r^2 values so consistently negative?

Question

I'm not sure if the problem is with my regression estimator models, or with my understanding of what the r^2 measure-of-fittedness actually means. I am working on a project using scikit learn and ~11 different regression estimators in order to produce (rough!) predictions of baseball fantasy performance. Certain models always fare better than others (Decision Tree Regression and Extra Tree Regression produce the worst r^2 scores, while ElasticCV and LassoCV produce the best r^2 scores and every once in a while might even be a slightly positive number!).

If a horizontal line produces an r^2 score of 0, then even if all my models were worthless, and literally have zero predictive value, and are spitting out numbers entirely at random, shouldn't then I get small positive numbers for r^2 sometimes, if from sheer dumb luck alone? 8 of the 11 estimators i use, despite running over different datasets hundreds of times, have never once produced even a tiny positive number for r^2.

Am I misunderstanding how r^2 works?

I am not switching the order in sklearn's .score function either. I have double checked this many times. When I do put the order of y_pred, y_true in the wrong way, it yields r^2 values that are hugely negative (like <-50 big)

The fact that thats the case actually lends more to my confusion as to how r^2 here is a measure of fittedness, but I digress...

## I don't know whether I'm supposed to include my df4 or even a
##sample, but suffice to say here is just a single row to show what
##kind of data we have.  It is all normalized and/or zscore'd
"""

>> print(df4.head(1))

        HomeAway  ParkFactor    Salary  HandedVs  Hand  oppoBullpen  \
Points                                                                       
3.0          1.0      -1.229 -0.122111       1.0          0.0     -0.90331   

        RibRunHistory  BibTibHistory  GrabBagHistory  oppoTotesRank  \
Points                                                                
3.0          0.964943       0.806874       -0.224993      -0.846859   

        oppoSwipesRank  oppoWalksRank      Temp    Precip  WindSpeed  \
Points                                                                 
3.0           -1.40371      -1.159115 -0.665324 -0.380048  -0.365671   

        WindDirection  oppoPositFantasy  oppoFantasy  
Points                                                
3.0          0.229944         -1.011505     0.919269  

"""



def ElasticNetValidation(df4):
    X = df4.values
    y = df4.index
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.1)

    ENTrain = ElasticNetCV(cv=20)
    ENTrain.fit(X_train, y_train)
    y_pred = ENTrain.predict(X_test)

    EN = ElasticNetCV(cv=20)
    ENModel = EN.fit(X, y)

    print('ElasticNet R^2: ' + str(r2_score(y_test, y_pred)))
    scores = cross_val_score(ENModel, X, y, cv=20)
    print("ElasticNet Accuracy: %0.2f (+/- %0.2f)" % (scores.mean(), scores.std() * 2))

    return ENModel

When i run this estimator, along with ten other regression estimators I have been experimenting with, I get both r2_score() and cross_val_score().mean() showing negative numbers nearly every time. Certain estimators ALWAYS produce negative scores that are not even close to zero (decision tree regressor, extra tree regressor). Certain estimators fare better and even sometimes produce a tiny positive score, never more than 0.01 though, and even those estimators (elasticCV, lassoCV, linearRegression) are negative most of the time, albeit only slightly negative.

Even if these models I'm building are horrible. SAy they are totally random and have no predictive power whatsoever when it comes to the target: shouldn't it predict better than a plain horizontal line as often as not? How is it that an unrelated model is predicting POORER than a horizontal line so consistently?

Try manually calculating the R-squared (R2) value as "R2 = 1.0 - (numpy.var(regression_error) / numpy.var(dependent_data))" and compare values. In my understanding, the R-squared values should not be negative unless there is a problem with the regression. — James Phillips, Jun 02 '19 at 11:19

shiningPanther · Answer 1 · 2020-10-13T05:13:37.130

You most likely have issues with overfitting. As you mentioned correctly, negative R2 values can occur if your model performs worse than just fitting an intercept term. Your models do probably not capture any 'real' underlying dependence but merely fit random noise. You are calculating the R2 score on a small test set and it is very well possible that this fitting of noise yields consistently worse result than a simple intercept term on the test set.

This is a typical case of bias-variance tradeoff. Your models have low bias and high variance and therefore perform poorly on the test data. There are certain models that aim at reducing overfit / variance, for example the Lasso and Elastic Net. These models actually are among the models that you see performing better.

In order to convince yourself that the sklearn's r2_score function works properly and to get familiarised with it, I would recommend that you first fit and predict your model on training data only (leave out the CV as well). R2 can never be negative in this case. Also make sure that your models include an intercept term (wherever available).

Why are my r^2 values so consistently negative?

1 Answers1