0

I've connected to my remote cluster via Client, now I'm trying to use Dask-ml

from sklearn.ensemble import RandomForestClassifier
from sklearn.externals import joblib
#import dask_ml.joblib

clf = RandomForestClassifier(n_estimators=200, n_jobs=-1)

with joblib.parallel_backend('dask', scatter = [X,y]):
    clf.fit(X,y)

Error 1) there is no dask_ml.joblib-- I get a module does not exist error

Error 2) if i remove this import I get a streaming connection closed error

Not seeing any good documentation on this. Any ideas on how to get Dask-ml to work with a remote cluster?

kaysuez
  • 47
  • 1
  • 7

1 Answers1

0
  1. Error 1

dask_ml.joblib has been removed. You just need to create a Client and use joblib.parallel_backend now.

  1. Error 2

Might be a spill-to-disk issue. Try reducing your dataframe size and check if you still get this issue.

I know you might have already solved your problem but this answer might help other people.

Subrat Sahu
  • 71
  • 1
  • 6