1

This question extends this question, but now I want to add the residuals for separate groups.

So, how to append the residuals when you want to run regressions for separate groups?

Here is the data frame:

df = pd.DataFrame({'gp': [1,1,1,1,1,2,2,2,2,2],
               'x1': [3.17, 4.76, 4.17, 8.70, 11.45, 3.17, 4.76, 4.17, 8.70, 11.45],
               'x2': [23, 26, 73, 72, 16, 26, 73, 72, 16, 25],
               'y': [880.37, 716.20, 974.79, 322.80, 1054.25, 980.37, 816.20, 1074.79, 522.80, 1254.25]},
               index=np.arange(10, 30, 2))

Now, I want to run separate regression for the two groups (gp), and append the residuals in a separate column.

I tried this code, but it only fills the residuals of the last regression group (gp=2):

import numpy as np
import pandas as pd
import statsmodels.formula.api as sm

def groupreg(df, regmodel):
    groups = df.groupby('gp')
    for item, group in groups:
        df['residual'] = sm.ols(formula=regmodel, data=group).fit().resid
    return (df)

regmodel = 'y ~ x1 + x2'

df = groupreg(df,regmodel)

I found one way to crack this problem, but the code is long en looks inefficient:

def groupreg2(df, regmodel):
    groups = df.groupby('gp')
    i = 0
    resname='residual'
    for item, group in groups:
        data = group.copy()
        data[resname] = sm.ols(formula=regmodel, data=data).fit().resid
        if i == 0:
            i = 1
            dout = data[resname].copy()
        else:
            dout = dout.append(data[resname].copy())

    df = pd.concat([df,dout],axis=1)
    return (df)

    df = groupreg2(df,regmodel)

Any suggestions to improve?

Martien Lubberink
  • 2,614
  • 1
  • 19
  • 31

2 Answers2

1

Simply turn your defined method into a groupby.apply() where you pass in each gp:

def groupreg(g):
    g['residual'] = sm.ols(formula=regmodel, data=g).fit().resid
    return g

df = df.groupby('gp').apply(groupreg)
print(df)

#     gp     x1  x2        y    residual
# 10   1   3.17  23   880.37  -43.579309
# 12   1   4.76  26   716.20 -174.532201
# 14   1   4.17  73   974.79  318.634921
# 16   1   8.70  72   322.80 -287.710952
# 18   1  11.45  16  1054.25  187.187542
# 20   2   3.17  26   980.37  174.295283
# 22   2   4.76  73   816.20 -173.045597
# 24   2   4.17  72  1074.79  101.623955
# 26   2   8.70  16   522.80 -372.840833
# 28   2  11.45  25  1254.25  269.967192
Parfait
  • 104,375
  • 17
  • 94
  • 125
0

I found the issue with your first solution, it's that you assign the result to original dataframe (df), instead you should assign it to the group, like that:

def groupreg(df, regmodel):
    groups = df.groupby('gp')
    for item, group in groups:
        group['residual'] = sm.ols(formula=regmodel, data=group).fit().resid
    return (df)

Just a slight change but it works as expected now.

Kacper Wolkowski
  • 1,517
  • 1
  • 16
  • 24
  • In my python your suggestion does not work: this is what I receive when executing your code: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead – Martien Lubberink Jul 07 '17 at 02:45