0

I am attempting to fit a model using xgboost in dask.

Here is the code I'm using to fit the model:

clf = xgboost.dask.DaskXGBClassifier(**params)
clf.client = client
clf.fit(X_train, Y_train)

Modeled after this example in the documentation: https://xgboost.readthedocs.io/en/stable/tutorials/dask.html under the Scikit-learn interface section.

X_train is a dask dataframe with 100 partitions and Y_train is a dask series. They are both 39569 rows long but I am still getting the error:

Check failed: labels.size() == num_row_ (1784 vs 1980) : Size of labels must equal to number of rows.

Even more puzzling is that the exact same code works on a smaller dataset 20 rows, 50 rows, 100 rows, 1000 rows all work, with partitions = 1. But having partitions = 1 defeats the point of using dask in the first place. Maybe that's why this error is occurring?

Any idea why this is happening and how to fix it?

0 Answers0