Suppose I have a DataFrame
with one column of y
variable and many columns of x
variables. I would like to be able to run multiple univariate regressions of y
vs x1
, y
vs x2
, ..., etc, and store the predictions back into the DataFrame
. Also I need to do this by a group variable.
import statsmodels.api as sm
import pandas as pd
df = pd.DataFrame({
'y': np.random.randn(20),
'x1': np.random.randn(20),
'x2': np.random.randn(20),
'grp': ['a', 'b'] * 10})
def ols_res(x, y):
return sm.OLS(y, x).fit().predict()
df.groupby('grp').apply(ols_res) # This does not work
The code above obviously does not work. It is not clear to me how to correctly pass the fixed y
to the function while having apply
iterating through the x
columns(x1
, x2
, ...). I suspect there might be a very clever one-line solution to do this. Any idea?