1

I am trying to predict my test set using a GaussianNb classifier with Dask. This is how my setup looks like:

X_train = pd.DataFrame.sparse.from_spmatrix(vectorizer.fit_transform(training['X_trn']))
y_train = encoder.fit_transform(training['y_trn'])
X_tst = pd.DataFrame.sparse.from_spmatrix(vectorizer.transform(testing['X_tst']))
y_tst = encoder.transform(testing['y_tst'])

clf = GaussianNB()
clf.fit(X_train, y_train)
clf.predict(X_tst)

All my X & y variables are Dask DataFrames, but, I get the following error:

AssertionError: length mismatch: 20 vs. 824

I carefully used fit_transform for my training set & fit for my test set but I've had no luck.

M_x
  • 782
  • 1
  • 8
  • 26
mendy
  • 191
  • 1
  • 12

0 Answers0