Python Dataframe: Calculating R^2 and RMSE Using Groupby on One Column

Question

I have the following Python dataframe:

Type    Actual  Predicted
A       4       3
A       10      18
A       13      11
B       3       10
B       4       2
B       8       33
C       20      17
C       40      33
C       87      80
C       32      30

I have the code to calculate R^2 and RMSE but I don't know how to calculate it by distinct "Type".

For now, my methodology is breaking the larger table into three smaller tables consisting of only A, B, C values and then calculating R^2 and RMSE off each smaller table...then appending them back together.

But the above method is inefficient and I believe there should be an easier way?

Below is the format I want the results to produce when things are grouped:

Type    R^2     RMSE    
A       value   value   
B       value   value   
C       value   value

do a groupby and apply the formulas as a function across the column — usernamenotfound, Dec 20 '17 at 21:24
would you mind giving us the r^2 and RMSE formula you have so we can test this out? It's been a while since stats class for me (and maybe others) — MattR, Dec 20 '17 at 21:29

score 38 · Accepted Answer · edited Jul 18 '22 at 06:57

38

Here is a groupby method:

import numpy as np
import pandas as pd
from sklearn.metrics import r2_score, mean_squared_error

def r2_rmse(g):
    r2 = r2_score(g['Actual'], g['Predicted'])
    rmse = np.sqrt(mean_squared_error(g['Actual'], g['Predicted']))
    return pd.Series(dict(r2 = r2, rmse = rmse))

your_df.groupby('Type').apply(r2_rmse).reset_index()

edited Jul 18 '22 at 06:57

ah bon

9,293
12
65
148

answered Dec 20 '17 at 21:30

Tom

593
4
9

this is amazing! thank you .... any tips on how to do this for a confidence interval? – PineNuts0 Dec 21 '17 at 05:05
2

`return` statement can be modified as `return pd.Series({'r2':r2, 'rmse':rmse})` – Abhilash Awasthi Aug 24 '20 at 13:37

Python Dataframe: Calculating R^2 and RMSE Using Groupby on One Column

1 Answers1

Linked