-1

I have following data

group   exog    endog
A   1.2 0.23
A   1.3 0.34
A   1.4 0.45
B   1.5 0.56
B   1.6 0.67
B   1.7 0.78
C   1.8 0.89
C   1.9 1
C   2   1.11

like this:

 def regression(df, exog, endog):
import statsmodels.api as sm

reg2 = sm.OLS(endog=df[exog], 
              exog=df[endog], 
              missing='drop')

results = reg2.fit()

df_ols_coefs = results.params.to_frame().T
df_ols_coefs.columns = [str(col) + '_coef' for col in df_ols.columns]

return df_ols_coefs

I thought about making "sub" dataframes from the original one, but I am stuck. Should I go for something like:

 for df in df_orginal:
   NOW I AM STUCK

? Explanation would really help me :D Thanks!

HeadOverFeet
  • 768
  • 6
  • 13
  • 33

2 Answers2

1

You can use flexible groupby.apply if need some general function for processing:

def regression(df, exog, endog):
    import statsmodels.api as sm

    reg2 = sm.OLS(endog=df[exog], 
                  exog=df[endog], 
                  missing='drop')

    results = reg2.fit()

    df_ols_coefs = results.params.to_frame().T
    #it seems typo - change to df_ols_coefs
    df_ols_coefs.columns = [str(col) + '_coef' for col in df_ols_coefs.columns]

    return df_ols_coefs

df1 = df.groupby('group').apply(regression, 'exog','endog')
print (df1)
         endog_coef
group              
A     0    3.633423
B     0    2.361952
C     0    1.892071
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • Hi @jezrael , I want to perform regressions on the sub df via statmodels. On the sub df I want to save the coefs of the regression into some general df where I want to store them. Do you think it will work via your code? Please, can you explain me the apply? I am a bit lost with apply. THANKS! – HeadOverFeet Aug 07 '18 at 17:56
  • 1
    @HeadOverFeet - I add link to docs, but is is easy, if check `print (x)` for each loop it return `Dataframe` per group. Also check last red alert why is first group returned twice. ;) – jezrael Aug 07 '18 at 17:59
  • 1
    @HeadOverFeet - Or [this](https://stackoverflow.com/a/49895227/2901002), it depends what need. – jezrael Aug 07 '18 at 18:04
  • Hi Jez, I am having this kind of error: 'DataFrame' objects are mutable, thus they cannot be hashed dont you know what is wrong? The columns that function returns are not the same as in the dataframe – HeadOverFeet Aug 07 '18 at 18:41
  • @HeadOverFeet - Is possible change data sample for possible run your code? maybe there is more columns? – jezrael Aug 07 '18 at 18:54
  • Do you mean, like uploading it here? – HeadOverFeet Aug 07 '18 at 19:00
  • @HeadOverFeet - No, only add more columns if necessary to data sample. – jezrael Aug 07 '18 at 19:00
  • I tried to add ,axis=1 into the apply function since I want to add columns into the current df. But the same mistake – HeadOverFeet Aug 07 '18 at 19:06
  • @HeadOverFeet - just tested, for me it working nice inpandas 0.23.1 – jezrael Aug 07 '18 at 19:12
1

All you need is:

for name, df in df_orginal.groupby(['group']):
   print(name)
   # Do something with df

What you essentialy do is group your data by column 'group'. And then iterate over groups. 'name' variable is your group, for example 1 (or 2, or 3). 'df' is a DataFrame containing all the data related to this group (1 or 2, or 3)

Sergei
  • 470
  • 4
  • 21