1

I recently moved to python for data analysis and apparently I am stuck on the basics. I am trying to regress the parameters of the following expression: z=20+x+3*y+noise, and I get the right intercept but the parameters are apparently an average of the x and y paramerts. What am i doing wrong? Code below:

import numpy as np
import pandas as pd
import statsmodels.formula.api as smf 

# generate true values, and noise around them
np.random.seed(5)
x = np.arange(1, 101)
y = np.arange(1, 101)
z = 20 + x + 3* y + np.random.normal(0, 20, 100)

data = pd.DataFrame({'x':x, 'y':y, 'z': z})

lm = smf.ols(formula='z ~ x + y', data=data).fit()

# print the coefficients
lm.summary()

returns

enter image description here

where the x and y parameters are both 1.5, instead of being 1 and 3. What's wrong?

famargar
  • 3,258
  • 6
  • 28
  • 44
  • Your x and y are the same series, With perfect collinear explanatory variables you get a pinv regularized solution / – Josef Mar 12 '17 at 12:31

0 Answers0