10

I am trying to do regression in Tensorflow. I'm not positive I am calculating R^2 correctly as Tensorflow gives me a different answer than sklearn.metrics.r2_score Can someone please look at my below code and let me know if I implemented the pictured equation correctly. Thanks

The formula I am attempting to create in TF

total_error = tf.square(tf.sub(y, tf.reduce_mean(y)))
unexplained_error = tf.square(tf.sub(y, prediction))
R_squared = tf.reduce_mean(tf.sub(tf.div(unexplained_error, total_error), 1.0))
R = tf.mul(tf.sign(R_squared),tf.sqrt(tf.abs(R_squared)))
Matt Camp
  • 1,448
  • 3
  • 17
  • 38

6 Answers6

10

What you are computing the "R^2" is

R^2_{\text{wrong}} = \operatorname{mean}_i \left( \frac{(y_i-\hat y_i)^2}{(y_i-\mu)^2} - 1\right)1

compared to the given expression, you are computing the mean at the wrong place. You should take the mean when computing the errors, before doing the division.

unexplained_error = tf.reduce_sum(tf.square(tf.sub(y, prediction)))
total_error = tf.reduce_sum(tf.square(tf.sub(y, tf.reduce_mean(y))))
R_squared = tf.sub(1, tf.div(unexplained_error, total_error))
Innat
  • 16,113
  • 6
  • 53
  • 101
kennytm
  • 510,854
  • 105
  • 1,084
  • 1,005
  • 1
    in tf.div in the third line, you have unexplained_error and total_error in the wrong positions, they need to be switched. – Nikhil Shinday Feb 22 '18 at 16:59
  • In your formulation, the (yi - mu) should be squared. it was reflected in the code, but it might accidently confuse some people (like me). – Rui Nian Jan 02 '19 at 17:29
6

I would strongly recommend against using a recipe to calculate this! The examples I've found do not produce consistent results, especially with just one target variable. This gave me enormous headaches!

The correct thing to do is to use tensorflow_addons.metrics.RQsquare(). Tensorflow Add Ons is on PyPi here and the documentation is a part of Tensorflow here. All you have to do is set y_shape to the shape of your output, often it is (1,) for a single output variable.

Furthermore... I would recommend the use of R squared at all. It shouldn't be used with deep networks.

R2 tends to optimistically estimate the fit of the linear regression. It always increases as the number of effects are included in the model. Adjusted R2 attempts to correct for this overestimation. Adjusted R2 might decrease if a specific effect does not improve the model.

IBM Cognos Analytics on Adjusted R Squared

rjurney
  • 4,824
  • 5
  • 41
  • 62
  • I agree with using the addon. For anyone using Colab, you will need to install the addons before your import them as they are not included by default: `!pip install tensorflow_addons` followed by your list of imports `import tensorflow_addons as tfa` – Josh Weston Nov 07 '21 at 23:24
  • I would _strongly_ recommend against using R^2 with a deep network unless you know what you are doing. > R2 tends to optimistically estimate the fit of the linear regression. It always increases as the number of effects are included in the model. Adjusted R2 attempts to correct for this overestimation. Adjusted R2 might decrease if a specific effect does not improve the model. https://www.ibm.com/docs/fi/cognos-analytics/11.1.0?topic=terms-adjusted-r-squared – rjurney Apr 02 '23 at 00:34
5

The function is given here:

def R_squared(y, y_pred):
  residual = tf.reduce_sum(tf.square(tf.subtract(y, y_pred)))
  total = tf.reduce_sum(tf.square(tf.subtract(y, tf.reduce_mean(y))))
  r2 = tf.subtract(1.0, tf.div(residual, total))
  return r2

The concept is explained here.

Shashank
  • 435
  • 5
  • 13
1

All the other solutions wouldn't produce the right R squared score for multidimensional y. The right way to calculate R2 (variance weighted) in TensorFlow is:

unexplained_error = tf.reduce_sum(tf.square(labels - predictions))
total_error = tf.reduce_sum(tf.square(labels - tf.reduce_mean(labels, axis=0)))
R2 = 1. - tf.div(unexplained_error, total_error)

The result from this TF snippet matches exactly the result from sklearn's:

from sklearn.metrics import r2_score
R2 = r2_score(labels, predictions, multioutput='variance_weighted')
Eric Aya
  • 69,473
  • 35
  • 181
  • 253
Ivelin
  • 12,293
  • 5
  • 37
  • 35
0

It should actually be the opposite on the rhs. Unexplained variance divided by total variance

Pierre
  • 1
0

I think it should be like this:

total_error = tf.reduce_sum(tf.square(tf.sub(y, tf.reduce_mean(y))))
unexplained_error = tf.reduce_sum(tf.square(tf.sub(y, prediction)))
R_squared = tf.sub(1, tf.div(unexplained_error, total_error))
Mingfei Sun
  • 506
  • 5
  • 5