I am trying to replicate this solution Python pandas: how to run multiple univariate regression by group but using sklearn linear regression instead of statsmodels.
import pandas as pd
import numpy as np
from sklearn.linear_model import LinearRegression
df = pd.DataFrame({
'y': np.random.randn(20),
'x1': np.random.randn(20),
'x2': np.random.randn(20),
'grp': ['a', 'b'] * 10})
def ols_res(x, y):
return pd.Series(LinearRegression.fit(x,y).predict(x))
results = df.groupby('grp').apply(lambda x : x[['x1', 'x2']].apply(ols_res, y=x['y']))
print(results)
I get:
TypeError: ("fit() missing 1 required positional argument: 'y'", 'occurred at index x1')
The results should be the same as the article I linked, which is:
x1 x2
grp
a 0 -0.102766 -0.205196
1 -0.073282 -0.102290
2 0.023832 0.033228
3 0.059369 -0.017519
4 0.003281 -0.077150
... ...
b 5 0.072874 -0.002919
6 0.180362 0.000502
7 0.005274 0.050313
8 -0.065506 -0.005163
9 0.003419 -0.013829