0

I built a multiple regression model using Python statsmodels.

X = df[['var1','var2','var3','var4']]
X = sm.add_constant(X) ## let's add an intercept (beta_0) to our model
y = df['target_trait']

model = sm.OLS(y, X).fit() #argument order: sm.OLS(output, input), see (https://towardsdatascience.com/simple-and-multiple-linear-regression-in-python-c928425168f9)
predictions = model.predict(X)
model.summary()

Now, I want to predict new data. the dataframe for my new data has 4 columns (var1, var2, var3, var4) and 143 rows. Below is how I proceeded.

X_new = df_new[['var1','var2','var3','var4']] #df_new has other variables not to be used. I am extracting the relevant variables.
y_new = model.predict(X_new)
y_new

Running the code above gave me ValueError: shapes (143,4) and (5,) not aligned: 4 (dim 1) != 5 (dim 0). I am not sure how to fix it. I really would appreciate your help. Thank you in advance for your time

Amilovsky
  • 397
  • 6
  • 15

2 Answers2

1

I think I found the issue. When fitting the model I added a constant to the X matrix by doing X = sm.add_constant(X). By doing the same to X_new, the algorithm worked. Anyway, thank you for taking a look.

Amilovsky
  • 397
  • 6
  • 15
0

Your result matrix y [target trait'] has no dimension (dim 0) indicating that you tried to pass in 4 columns and 143 rows of x variable with no y result.

Ethan F.
  • 11
  • 2