Appending predicted residuals to pandas dataframe - by groups

Question

This question extends this question, but now I want to add the residuals for separate groups.

So, how to append the residuals when you want to run regressions for separate groups?

Here is the data frame:

df = pd.DataFrame({'gp': [1,1,1,1,1,2,2,2,2,2],
               'x1': [3.17, 4.76, 4.17, 8.70, 11.45, 3.17, 4.76, 4.17, 8.70, 11.45],
               'x2': [23, 26, 73, 72, 16, 26, 73, 72, 16, 25],
               'y': [880.37, 716.20, 974.79, 322.80, 1054.25, 980.37, 816.20, 1074.79, 522.80, 1254.25]},
               index=np.arange(10, 30, 2))

Now, I want to run separate regression for the two groups (gp), and append the residuals in a separate column.

I tried this code, but it only fills the residuals of the last regression group (gp=2):

import numpy as np
import pandas as pd
import statsmodels.formula.api as sm

def groupreg(df, regmodel):
    groups = df.groupby('gp')
    for item, group in groups:
        df['residual'] = sm.ols(formula=regmodel, data=group).fit().resid
    return (df)

regmodel = 'y ~ x1 + x2'

df = groupreg(df,regmodel)

I found one way to crack this problem, but the code is long en looks inefficient:

def groupreg2(df, regmodel):
    groups = df.groupby('gp')
    i = 0
    resname='residual'
    for item, group in groups:
        data = group.copy()
        data[resname] = sm.ols(formula=regmodel, data=data).fit().resid
        if i == 0:
            i = 1
            dout = data[resname].copy()
        else:
            dout = dout.append(data[resname].copy())

    df = pd.concat([df,dout],axis=1)
    return (df)

    df = groupreg2(df,regmodel)

Any suggestions to improve?

score 1 · Accepted Answer · answered Jul 07 '17 at 01:30

Simply turn your defined method into a groupby.apply() where you pass in each gp:

def groupreg(g):
    g['residual'] = sm.ols(formula=regmodel, data=g).fit().resid
    return g

df = df.groupby('gp').apply(groupreg)
print(df)

#     gp     x1  x2        y    residual
# 10   1   3.17  23   880.37  -43.579309
# 12   1   4.76  26   716.20 -174.532201
# 14   1   4.17  73   974.79  318.634921
# 16   1   8.70  72   322.80 -287.710952
# 18   1  11.45  16  1054.25  187.187542
# 20   2   3.17  26   980.37  174.295283
# 22   2   4.76  73   816.20 -173.045597
# 24   2   4.17  72  1074.79  101.623955
# 26   2   8.70  16   522.80 -372.840833
# 28   2  11.45  25  1254.25  269.967192

score 0 · Answer 2 · answered Jul 07 '17 at 00:12

0

I found the issue with your first solution, it's that you assign the result to original dataframe (df), instead you should assign it to the group, like that:

def groupreg(df, regmodel):
    groups = df.groupby('gp')
    for item, group in groups:
        group['residual'] = sm.ols(formula=regmodel, data=group).fit().resid
    return (df)

Just a slight change but it works as expected now.

answered Jul 07 '17 at 00:12

Kacper Wolkowski

1,517
1
16
24

In my python your suggestion does not work: this is what I receive when executing your code: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead – Martien Lubberink Jul 07 '17 at 02:45

Appending predicted residuals to pandas dataframe - by groups

2 Answers2