I'm not sure if the problem is with my regression estimator models, or with my understanding of what the r^2 measure-of-fittedness actually means. I am working on a project using scikit learn and ~11 different regression estimators in order to produce (rough!) predictions of baseball fantasy performance. Certain models always fare better than others (Decision Tree Regression and Extra Tree Regression produce the worst r^2 scores, while ElasticCV and LassoCV produce the best r^2 scores and every once in a while might even be a slightly positive number!).
If a horizontal line produces an r^2 score of 0, then even if all my models were worthless, and literally have zero predictive value, and are spitting out numbers entirely at random, shouldn't then I get small positive numbers for r^2 sometimes, if from sheer dumb luck alone? 8 of the 11 estimators i use, despite running over different datasets hundreds of times, have never once produced even a tiny positive number for r^2.
Am I misunderstanding how r^2 works?
I am not switching the order in sklearn's .score function either. I have double checked this many times. When I do put the order of y_pred, y_true in the wrong way, it yields r^2 values that are hugely negative (like <-50 big)
The fact that thats the case actually lends more to my confusion as to how r^2 here is a measure of fittedness, but I digress...
## I don't know whether I'm supposed to include my df4 or even a
##sample, but suffice to say here is just a single row to show what
##kind of data we have. It is all normalized and/or zscore'd
"""
>> print(df4.head(1))
HomeAway ParkFactor Salary HandedVs Hand oppoBullpen \
Points
3.0 1.0 -1.229 -0.122111 1.0 0.0 -0.90331
RibRunHistory BibTibHistory GrabBagHistory oppoTotesRank \
Points
3.0 0.964943 0.806874 -0.224993 -0.846859
oppoSwipesRank oppoWalksRank Temp Precip WindSpeed \
Points
3.0 -1.40371 -1.159115 -0.665324 -0.380048 -0.365671
WindDirection oppoPositFantasy oppoFantasy
Points
3.0 0.229944 -1.011505 0.919269
"""
def ElasticNetValidation(df4):
X = df4.values
y = df4.index
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.1)
ENTrain = ElasticNetCV(cv=20)
ENTrain.fit(X_train, y_train)
y_pred = ENTrain.predict(X_test)
EN = ElasticNetCV(cv=20)
ENModel = EN.fit(X, y)
print('ElasticNet R^2: ' + str(r2_score(y_test, y_pred)))
scores = cross_val_score(ENModel, X, y, cv=20)
print("ElasticNet Accuracy: %0.2f (+/- %0.2f)" % (scores.mean(), scores.std() * 2))
return ENModel
When i run this estimator, along with ten other regression estimators I have been experimenting with, I get both r2_score() and cross_val_score().mean() showing negative numbers nearly every time. Certain estimators ALWAYS produce negative scores that are not even close to zero (decision tree regressor, extra tree regressor). Certain estimators fare better and even sometimes produce a tiny positive score, never more than 0.01 though, and even those estimators (elasticCV, lassoCV, linearRegression) are negative most of the time, albeit only slightly negative.
Even if these models I'm building are horrible. SAy they are totally random and have no predictive power whatsoever when it comes to the target: shouldn't it predict better than a plain horizontal line as often as not? How is it that an unrelated model is predicting POORER than a horizontal line so consistently?