This question extends this question, but now I want to add the residuals for separate groups.
So, how to append the residuals when you want to run regressions for separate groups?
Here is the data frame:
df = pd.DataFrame({'gp': [1,1,1,1,1,2,2,2,2,2],
'x1': [3.17, 4.76, 4.17, 8.70, 11.45, 3.17, 4.76, 4.17, 8.70, 11.45],
'x2': [23, 26, 73, 72, 16, 26, 73, 72, 16, 25],
'y': [880.37, 716.20, 974.79, 322.80, 1054.25, 980.37, 816.20, 1074.79, 522.80, 1254.25]},
index=np.arange(10, 30, 2))
Now, I want to run separate regression for the two groups (gp), and append the residuals in a separate column.
I tried this code, but it only fills the residuals of the last regression group (gp=2):
import numpy as np
import pandas as pd
import statsmodels.formula.api as sm
def groupreg(df, regmodel):
groups = df.groupby('gp')
for item, group in groups:
df['residual'] = sm.ols(formula=regmodel, data=group).fit().resid
return (df)
regmodel = 'y ~ x1 + x2'
df = groupreg(df,regmodel)
I found one way to crack this problem, but the code is long en looks inefficient:
def groupreg2(df, regmodel):
groups = df.groupby('gp')
i = 0
resname='residual'
for item, group in groups:
data = group.copy()
data[resname] = sm.ols(formula=regmodel, data=data).fit().resid
if i == 0:
i = 1
dout = data[resname].copy()
else:
dout = dout.append(data[resname].copy())
df = pd.concat([df,dout],axis=1)
return (df)
df = groupreg2(df,regmodel)
Any suggestions to improve?