Iteration in the data frame pandas

Question

I have following data

group   exog    endog
A   1.2 0.23
A   1.3 0.34
A   1.4 0.45
B   1.5 0.56
B   1.6 0.67
B   1.7 0.78
C   1.8 0.89
C   1.9 1
C   2   1.11

like this:

 def regression(df, exog, endog):
import statsmodels.api as sm

reg2 = sm.OLS(endog=df[exog], 
              exog=df[endog], 
              missing='drop')

results = reg2.fit()

df_ols_coefs = results.params.to_frame().T
df_ols_coefs.columns = [str(col) + '_coef' for col in df_ols.columns]

return df_ols_coefs

I thought about making "sub" dataframes from the original one, but I am stuck. Should I go for something like:

 for df in df_orginal:
   NOW I AM STUCK

? Explanation would really help me :D Thanks!

Can you explain more what kind of processing? – jezrael Aug 07 '18 at 17:53 — jezrael, Aug 07 '18 at 17:53

jezrael · Answer 1 · 2018-08-07T19:28:19.073

1

You can use flexible groupby.apply if need some general function for processing:

def regression(df, exog, endog):
    import statsmodels.api as sm

    reg2 = sm.OLS(endog=df[exog], 
                  exog=df[endog], 
                  missing='drop')

    results = reg2.fit()

    df_ols_coefs = results.params.to_frame().T
    #it seems typo - change to df_ols_coefs
    df_ols_coefs.columns = [str(col) + '_coef' for col in df_ols_coefs.columns]

    return df_ols_coefs

df1 = df.groupby('group').apply(regression, 'exog','endog')
print (df1)
         endog_coef
group              
A     0    3.633423
B     0    2.361952
C     0    1.892071

edited Aug 07 '18 at 19:28

answered Aug 07 '18 at 17:52

jezrael

822,522
95
1,334
1,252

Hi @jezrael , I want to perform regressions on the sub df via statmodels. On the sub df I want to save the coefs of the regression into some general df where I want to store them. Do you think it will work via your code? Please, can you explain me the apply? I am a bit lost with apply. THANKS! – HeadOverFeet Aug 07 '18 at 17:56
1

@HeadOverFeet - I add link to docs, but is is easy, if check `print (x)` for each loop it return `Dataframe` per group. Also check last red alert why is first group returned twice. ;) – jezrael Aug 07 '18 at 17:59
1

@HeadOverFeet - Or [this](https://stackoverflow.com/a/49895227/2901002), it depends what need. – jezrael Aug 07 '18 at 18:04
Hi Jez, I am having this kind of error: 'DataFrame' objects are mutable, thus they cannot be hashed dont you know what is wrong? The columns that function returns are not the same as in the dataframe – HeadOverFeet Aug 07 '18 at 18:41
@HeadOverFeet - Is possible change data sample for possible run your code? maybe there is more columns? – jezrael Aug 07 '18 at 18:54
Do you mean, like uploading it here? – HeadOverFeet Aug 07 '18 at 19:00
@HeadOverFeet - No, only add more columns if necessary to data sample. – jezrael Aug 07 '18 at 19:00
I tried to add ,axis=1 into the apply function since I want to add columns into the current df. But the same mistake – HeadOverFeet Aug 07 '18 at 19:06
@HeadOverFeet - just tested, for me it working nice inpandas 0.23.1 – jezrael Aug 07 '18 at 19:12

score 1 · Answer 2 · answered Aug 07 '18 at 18:05

All you need is:

for name, df in df_orginal.groupby(['group']):
   print(name)
   # Do something with df

What you essentialy do is group your data by column 'group'. And then iterate over groups. 'name' variable is your group, for example 1 (or 2, or 3). 'df' is a DataFrame containing all the data related to this group (1 or 2, or 3)

Iteration in the data frame pandas

2 Answers2