how to use for loop on glm

Question

I'd like to use sentiment scores to predict each of the stock's return (stock1, stock2, and stock 3). Please see the sample dataset below.

data={"sentiment":[0.9, 0.75, 0.88, 0.23] , "stock1":[0.0015, 0.034, -0.065, 0.015], "stock2":[0.023, -0.001, 0.0098, 0.072], "stock3":[-0.0052, 0.0083, 0.012, 0.094]}
sample=pd.DataFrame(data, columns=['sentiment', 'stock1', 'stock2', 'stock3'])
print(sample)

instead of running regression 3 times, I'd like to use for loop to iterate over the 3 different stock returns, here my try:

diff_stock=['stock1','stock2','stock3']
for i in diff_stock:
    model=glm(formula='i ~ sentiment', data=sample, family=sm.families.Gaussian()).fit()
    print(model.summary())

However, I keep getting this error message:

PatsyError: Number of rows mismatch between data argument and i (3377 versus 1) i ~ favorite_count

It seems like there's only 1 value in i (the stock column), but I don't understand why...

score 0 · Answer 1 · answered Feb 04 '21 at 18:05

0

You need to construct the formula as a string, for example:

import statsmodels.formula.api as smf

for i in diff_stock:
    model=smf.glm(formula= i + ' ~ sentiment', data=sample,
    family=sm.families.Gaussian()).fit()
    print(model.summary())

answered Feb 04 '21 at 18:05

StupidWolf

45,075
17
40
72

how to use for loop on glm

1 Answers1