Questions tagged [dask-ml]
79 questions
2
votes
0 answers
Is sklearn learning_curve function supported by dask?
I'm computing learning curves out of random forests using sklearn. I need to do it for lot of different RFs, therefore I want to use a cluster and Dask to reduce the time of the RFs fits.
Currently I implemented the following algorithm:
from…

H4dr1en
- 277
- 2
- 11
1
vote
1 answer
Sagemaker Notebook instance error AttributeError: 'MaterializedLayer' object has no attribute 'pack_annotations'
I have a dask cluster active
from dask.distributed import Client, progress
client = Client()
client
When I try to encode my data I get the error:
AttributeError: 'MaterializedLayer' object has no attribute 'pack_annotations'
I encoded the data…

Alejandro
- 119
- 7
1
vote
1 answer
Dask still Slower than Pandas on Large Dataset 3.2 Go
I am currently Trying Dask locally (parallel processing) for the first Time on a large Dataset (3.2 Go). I am comparing Dasks speed with pandas on simple computations. Using Dask seems to result in slower execution time in any task beside reading…

Not_So_Solid_Snake
- 40
- 5
1
vote
1 answer
Apply dask QuantileTransformer to a calculated field in the same dataframe
I'm trying to apply a dask-ml QuantileTransformer transformation to a percentage field, and create a new field percentage_qt in the same dataframe. But I get the error Array assignment only supports 1-D arrays. How to make this work?
import pandas…

ps0604
- 1,227
- 23
- 133
- 330
1
vote
1 answer
Compute list of dask delayed object
I have gone all similar question and solutions provided, but not getting desired output.
I have a list of dask delayed objects.
for y in ys:
projection = Projection(data, X, y)
fi = projection.decode()
var.append(fi)
where Projection class…

ipj
- 67
- 7
1
vote
1 answer
Issues with dask compute() on labels predicted by KMeans
I am trying to use sklearn MiniBatchKMeans to cluster a fairly large dataset (150k samples and 150k features). I thought I could make things much faster using Incremental from dask_ml to fit my data in chunks. Here is a snippet of my code on a dummy…

coolbeans
- 11
- 2
1
vote
1 answer
Dask with tensor flow is failing with `CRITICAL - Failed to Serialize` error
I have installed dask[complete], tensorflow, scikeras, deplayed, dask-ml.
I am running the same example link in my local. There are no stack traces in worker logs as well. Please help me with inputs to degug further.
The code is failing with…

RagavMaddali
- 11
- 1
1
vote
0 answers
Dask ML - GaussianNB returns length mismatch error
I am trying to predict my test set using a GaussianNb classifier with Dask. This is how my setup looks like:
X_train = pd.DataFrame.sparse.from_spmatrix(vectorizer.fit_transform(training['X_trn']))
y_train =…

mendy
- 191
- 1
- 12
1
vote
1 answer
KilledWorker Exception
I am using coiled to spin up a cluster and using dask to do some manipulation on a csv read from an S3 bucket. However, at some point my workers are getting killed. When I inspected the logs, the following task is killing them.
distributed.scheduler…

QuantNoob
- 13
- 3
1
vote
0 answers
Dask-ml LabelEncoder.fit_tranform() threw AttributeError: 'bool' object has no attribute 'astype'
So I tried to apply LabelEncoder() function to columns that have object dtype on my Dask dataframe:
le = dm.LabelEncoder() #dm is dask-ml module
for column in df.columns:
if df[column].dtype == type(object):
df[column]…

Nendra Haryo
- 11
- 2
1
vote
1 answer
Impute mean of single column in dask-ml
Calculating and imputing the mean using dask-ml works fine when changing all the columns that are np.nan:
imputer = impute.SimpleImputer(strategy='mean')
data = [[100, 2], [np.nan, np.nan], [70, 7]]
df = pd.DataFrame(data, columns = ['Weight',…

ps0604
- 1,227
- 23
- 133
- 330
1
vote
1 answer
Installing dask-ml throws "Solving Environment" error
I'm getting the following errors when trying to install dask-ml with conda. Any ideas how to fix this?
(env3) C:\>conda install -c conda-forge dask-ml
Collecting package metadata (current_repodata.json): done
Solving environment: failed with initial…

ps0604
- 1,227
- 23
- 133
- 330
1
vote
1 answer
Problems implementing Dask MinMaxScaler
I am having problems normalizing a dask.dataframe.core.DataFrame using Dask.dask_ml.preprocessing.MinMaxScaler, I am able to use sklearn.preprocessing.MinMaxScaler however I wish to use dask to scale up.
Minimal, Reproducible Example:
# Get data
ddf…

AmyChodorowski
- 392
- 2
- 14
1
vote
0 answers
How to reduce the `dask_ml.xgboost` worker memory consumption?
I've been testing the dask_ml.xgboost regressor on a synthetic 10GB dataset. When training, the memory usage of the workers exceeds the amount available on my local laptop. I am aware that I can try running on an online dask cluster with larger…

Joseph
- 11
- 1
1
vote
0 answers
How much memory need for XGBoost model?
Background:
Training set with 100m rows and about 50 columns, and i have cast the dtype to the minimum types. still, the dataframe is like 8-10Gb when loaded.
Run training on AWS ec2 instances(one is 36CPU + 72RAM. another is 16CPU +…

Argos.LEE
- 139
- 2
- 6