7

I want to ask a quick question related to regression analysis in python pandas. So, assume that I have the following datasets:

 Group      Y        X
  1         10       6
  1         5        4
  1         3        1
  2         4        6
  2         2        4
  2         3        9

My aim is to run regression; Y is dependent and X is independent variable. The issue is I want to run this regression by Group and print the coefficients in a new data set. So, the results should be like:

 Group   Coefficient
   1        0.25 (lets assume that coefficient is 0.25)
   2        0.30

I hope I can explain my question. Many thanks in advance for your help.

Khalid
  • 621
  • 1
  • 7
  • 14
  • just fit separate regression model for each group... – MaxU - stand with Ukraine Apr 18 '18 at 08:45
  • even if I have one million group? – Khalid Apr 18 '18 at 08:47
  • well, it depends on your goals... What are you going to do with those coefficients? – MaxU - stand with Ukraine Apr 18 '18 at 08:48
  • Why is my goal matter here? The important point is I will need all of these coefficients. Actually, I think I cant run separate regression. Its time consuming, since I have huge number of groups. It s tick by tick data. – Khalid Apr 18 '18 at 08:51
  • knowing your goals, we might give you better advise... Currently I don't see why would you need to do regression in each group instead of doing it once for the whole data set, using `Group` and `X` as an input `X` data set and `Y` as your target – MaxU - stand with Ukraine Apr 18 '18 at 08:55
  • Ah, I see. Lets say I have five hundred stocks and want to run separate regression for each stocks and want to see the coefficients for each stocks. – Khalid Apr 18 '18 at 09:03

1 Answers1

14

I am not sure about the type of regression you need, but this is how you do an OLS (Ordinary least squares):

import pandas as pd
import statsmodels.api as sm 

def regress(data, yvar, xvars):
    Y = data[yvar]
    X = data[xvars]
    X['intercept'] = 1.
    result = sm.OLS(Y, X).fit()
    return result.params


#This is what you need
df.groupby('Group').apply(regress, 'Y', ['X'])

You can define your regression function and pass parameters to it as mentioned.

iDrwish
  • 3,085
  • 1
  • 15
  • 24