0

I'm trying to run a multiple regression per group, using the instructions in the following post: How to apply OLS from statsmodels to groupby. My code snippet is as follows:

for coins in df_raw.symbol.unique():
    tempdf = df_raw[df_raw.symbol == coins]
    y = (df_raw['Lagged return']).astype(float)
    x1 = (df_raw['Excess daily return']).astype(float)
    x2 = (df_raw['Excess weekly return']).astype(float)
    x3 = (df_raw['Excess monthly return']).astype(float)
    x4 = (df_raw['Trading vol / mkt cap']).astype(float)
    x5 = (df_raw['Std dev']).astype(float)
    x6 = (df_raw['Residual risk']).astype(float)
    result = smf.ols(formula='y ~ x1 + x2 + x3 + x4 + x5 + x6', data=df_raw).fit()
    print(result.params)
    print(result.summary())

However, when I run the regression I get exactly the same regression result repeating for every single group in the dataframe (despite the underlying data being different.)

Intercept    0.010033
x1          -0.000214
x2          -0.000014
x3          -0.000094
x4          -0.001902
x5          -0.000009
x6          -0.000006

Is anyone able to advise where I'm going wrong? Thanks in advance!

Talkar81
  • 87
  • 1
  • 2
  • 8
  • 2
    Have you tried replacing `df_raw` with `tempdf`? Meaning `y = (df_raw['Lagged return']).astype(float)` should be `y = tempdf['Lagged return'].astype(float)` (same goes for x1, x2, x3...) – user3471881 Dec 13 '18 at 06:59
  • Thank you user3471881, your suggestion fixed the issue! Appreciate your help :)....if you write this as an answer (rather than a comment) I will mark it as correct so you get the credit. Thanks! – Talkar81 Dec 13 '18 at 20:41

0 Answers0