0

I am getting an error when trying to use statsmodels .predict to predict my test values.

Code:

X_train, X_test, y_train, y_test = train_test_split(X_new_np, y, test_size=0.2, random_state=42)
logit = sm.Logit(y_train, X_train)
reg = logit.fit_regularized(start_params=None, method='l1_cvxopt_cp', maxiter= 1000, full_output=1, disp=1, callback=None, alpha=.01, trim_mode='auto', auto_trim_tol=0.01, size_trim_tol=0.0001, qc_tol=0.03)
reg.summary()
y_pred_test = logit.predict(X_test)

Error:

ValueError: shapes (1000,61) and (251,61) not aligned: 61 (dim 1) != 251 (dim 0)
thepunitsingh
  • 713
  • 1
  • 12
  • 30
mpollinger
  • 63
  • 5
  • 1
    A full traceback would help. But this kind of error is raised by `np.dot` when the dimensions aren't right. The 2nd argument should have shape (61,251), as indicated by the error message. How that traces back to your code has to be deduced from the traceback. – hpaulj Dec 20 '20 at 19:42
  • Thanks! Yes I do understand that it's a linear algebra issue that the matrices can't be multiplied together because the inner dimensions don't match. I just don't have a clue as to why they don't match. – mpollinger Dec 20 '20 at 20:17

1 Answers1

2

You simply don't predict from the right object. reg is the one that was fitted, you should then use reg.predict. The following code runs without error (I used your fit_regularized parameters).

from sklearn.model_selection import train_test_split
import numpy as np
from statsmodels.api import Logit

x = np.random.randn(100,50)
y = np.random.randint(0,2,100).astype(bool)

print(x.shape, y.shape)

X_train, X_test, y_train, y_test = train_test_split(x, y, test_size=.2)

logit = Logit(y_train, X_train)
reg = logit.fit_regularized(start_params=None, method='l1_cvxopt_cp',
        maxiter= 1000, full_output=1, disp=1, callback=None,
        alpha=.01, trim_mode='auto', auto_trim_tol=0.01,
        size_trim_tol=0.0001, qc_tol=0.03)
print(reg.summary())
y_pred_test = reg.predict(X_test)

tanglef
  • 106
  • 1
  • 4