I am trying to predict my test set using a GaussianNb classifier with Dask. This is how my setup looks like:
X_train = pd.DataFrame.sparse.from_spmatrix(vectorizer.fit_transform(training['X_trn']))
y_train = encoder.fit_transform(training['y_trn'])
X_tst = pd.DataFrame.sparse.from_spmatrix(vectorizer.transform(testing['X_tst']))
y_tst = encoder.transform(testing['y_tst'])
clf = GaussianNB()
clf.fit(X_train, y_train)
clf.predict(X_tst)
All my X & y variables are Dask DataFrames, but, I get the following error:
AssertionError: length mismatch: 20 vs. 824
I carefully used fit_transform for my training set & fit for my test set but I've had no luck.