I recently moved to python for data analysis and apparently I am stuck on the basics. I am trying to regress the parameters of the following expression: z=20+x+3*y+noise, and I get the right intercept but the parameters are apparently an average of the x and y paramerts. What am i doing wrong? Code below:
import numpy as np
import pandas as pd
import statsmodels.formula.api as smf
# generate true values, and noise around them
np.random.seed(5)
x = np.arange(1, 101)
y = np.arange(1, 101)
z = 20 + x + 3* y + np.random.normal(0, 20, 100)
data = pd.DataFrame({'x':x, 'y':y, 'z': z})
lm = smf.ols(formula='z ~ x + y', data=data).fit()
# print the coefficients
lm.summary()
returns
where the x and y parameters are both 1.5, instead of being 1 and 3. What's wrong?