-2

I'm trying to predict with a different dataset. But still have a problem with it

I've tried to change the parameters, but still no difference.

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state=77)
X_train.shape, X_test.shape, y_train.shape, y_test.shape

((15484, 4587), (3871, 4587), (15484,), (3871,))

nb = MultinomialNB(alpha=0.01)
mnb = nb.partial_fit(X_train, y_train, classes)

and then I'm calling my 2nd dataset:

X_train3, X_test3, y_train3, y_test3 = train_test_split(X3, y3, test_size = 0.99999, random_state=77)
X_train3.shape, X_test3.shape, y_train3.shape, y_test3.shape

((0, 1445), (4155, 1445), (0,), (4155,))

y_pred=mnb.predict(X_test3)

ValueError: shapes (4155,1445) and (4587,7) not aligned: 1445 (dim 1) != 4587 (dim 0)

I expect the model can predict with my second dataset. Any help is appreciated. tks!

nstrtm
  • 15
  • 6
  • 2
    Why do you expect your model to work with the other data set? I has a different number of features (1445 instead of 4587 as the first data set) and classifier you chose needs the same number of variables – sjwilczynski Jun 16 '19 at 10:00
  • after I made a model with the first dataset (train + validation), I just wanna testing my model with my second dataset. Any suggestion? – nstrtm Jun 17 '19 at 04:00
  • As written in the accepted answer you can't use the other data set to test your model as numbers of features don't match. Are you using your validation set to choose from different models? If it is unused then you can use it as test set and evaluate your model performance on it. However, if you use it somehow, you can split your first data set to 3 parts: train, validation and test int ratio 60:20:20 and evaluate performance only on test set. – sjwilczynski Jun 17 '19 at 08:24

1 Answers1

1

Have a look at the sci-kit learn documentation for Multinomial NB.

It clearly specifies the structure of the input data while trainig model.fit() must match the structure of the input data while testing or scoring model.predict().

This means that you cannot use the same model for different dataset. The only way this is possible is that both the dataset have the same features (same number of features and in the same order as the training dataset).

In your case this is not going to work as the datasets are different which is visible from the shape of the two datasets.

Set 1 has 4587 features
Set 2 has 1445 features

Make sure the both the dataset have the same number of features and in the same order as the training set.

skillsmuggler
  • 1,862
  • 1
  • 11
  • 16