Highest Voted 'dask-ml' Questions

2

votes

0 answers

Is sklearn learning_curve function supported by dask?

I'm computing learning curves out of random forests using sklearn. I need to do it for lot of different RFs, therefore I want to use a cluster and Dask to reduce the time of the RFs fits. Currently I implemented the following algorithm: from…

asked May 02 '19 at 13:25

H4dr1en

277
2
11

1

vote

1 answer

Sagemaker Notebook instance error AttributeError: 'MaterializedLayer' object has no attribute 'pack_annotations'

I have a dask cluster active from dask.distributed import Client, progress client = Client() client When I try to encode my data I get the error: AttributeError: 'MaterializedLayer' object has no attribute 'pack_annotations' I encoded the data…

dask amazon-sagemaker dask-distributed dask-ml distributed-training

asked Jun 15 '22 at 16:03

Alejandro

119
7

1

vote

1 answer

Dask still Slower than Pandas on Large Dataset 3.2 Go

I am currently Trying Dask locally (parallel processing) for the first Time on a large Dataset (3.2 Go). I am comparing Dasks speed with pandas on simple computations. Using Dask seems to result in slower execution time in any task beside reading…

pandas parallel-processing dask dask-dataframe dask-ml

asked Apr 12 '22 at 17:51

Not_So_Solid_Snake

40
5

1

vote

1 answer

Apply dask QuantileTransformer to a calculated field in the same dataframe

I'm trying to apply a dask-ml QuantileTransformer transformation to a percentage field, and create a new field percentage_qt in the same dataframe. But I get the error Array assignment only supports 1-D arrays. How to make this work? import pandas…

python dask dask-distributed dask-ml

asked Feb 01 '22 at 22:04

ps0604

1,227
23
133
330

1

vote

1 answer

Compute list of dask delayed object

I have gone all similar question and solutions provided, but not getting desired output. I have a list of dask delayed objects. for y in ys: projection = Projection(data, X, y) fi = projection.decode() var.append(fi) where Projection class…

python dask delayed-execution dask-delayed dask-ml

asked Dec 04 '21 at 02:35

ipj

67
7

1

vote

1 answer

Issues with dask compute() on labels predicted by KMeans

I am trying to use sklearn MiniBatchKMeans to cluster a fairly large dataset (150k samples and 150k features). I thought I could make things much faster using Incremental from dask_ml to fit my data in chunks. Here is a snippet of my code on a dummy…

python scikit-learn dask dask-ml

asked Jun 14 '21 at 09:13

coolbeans

11
2

1

vote

1 answer

Dask with tensor flow is failing with `CRITICAL - Failed to Serialize` error

I have installed dask[complete], tensorflow, scikeras, deplayed, dask-ml. I am running the same example link in my local. There are no stack traces in worker logs as well. Please help me with inputs to degug further. The code is failing with…

python-3.x dask dask-distributed dask-ml

asked Apr 29 '21 at 08:31

RagavMaddali

11
1

1

vote

0 answers

Dask ML - GaussianNB returns length mismatch error

I am trying to predict my test set using a GaussianNb classifier with Dask. This is how my setup looks like: X_train = pd.DataFrame.sparse.from_spmatrix(vectorizer.fit_transform(training['X_trn'])) y_train =…

python dask naivebayes dask-ml

asked Apr 11 '21 at 23:56

mendy

191
1
12

1

vote

1 answer

KilledWorker Exception

I am using coiled to spin up a cluster and using dask to do some manipulation on a csv read from an S3 bucket. However, at some point my workers are getting killed. When I inspected the logs, the following task is killing them. distributed.scheduler…

python dask dask-distributed dask-ml coiled

asked Mar 28 '21 at 17:07

QuantNoob

13
3

1

vote

0 answers

Dask-ml LabelEncoder.fit_tranform() threw AttributeError: 'bool' object has no attribute 'astype'

So I tried to apply LabelEncoder() function to columns that have object dtype on my Dask dataframe: le = dm.LabelEncoder() #dm is dask-ml module for column in df.columns: if df[column].dtype == type(object): df[column]…

python dataframe dask label-encoding dask-ml

asked Jan 21 '21 at 19:01

Nendra Haryo

11
2

1

vote

1 answer

Impute mean of single column in dask-ml

Calculating and imputing the mean using dask-ml works fine when changing all the columns that are np.nan: imputer = impute.SimpleImputer(strategy='mean') data = [[100, 2], [np.nan, np.nan], [70, 7]] df = pd.DataFrame(data, columns = ['Weight',…

python machine-learning dask dask-ml

asked Dec 22 '20 at 14:50

ps0604

1,227
23
133
330

1

vote

1 answer

Installing dask-ml throws "Solving Environment" error

I'm getting the following errors when trying to install dask-ml with conda. Any ideas how to fix this? (env3) C:\>conda install -c conda-forge dask-ml Collecting package metadata (current_repodata.json): done Solving environment: failed with initial…

python dask dask-ml

asked Dec 21 '20 at 14:56

ps0604

1,227
23
133
330

1

vote

1 answer

Problems implementing Dask MinMaxScaler

I am having problems normalizing a dask.dataframe.core.DataFrame using Dask.dask_ml.preprocessing.MinMaxScaler, I am able to use sklearn.preprocessing.MinMaxScaler however I wish to use dask to scale up. Minimal, Reproducible Example: # Get data ddf…

python dask dask-ml

asked Nov 30 '20 at 16:27

AmyChodorowski

392
2
14

1

vote

0 answers

How to reduce the `dask_ml.xgboost` worker memory consumption?

I've been testing the dask_ml.xgboost regressor on a synthetic 10GB dataset. When training, the memory usage of the workers exceeds the amount available on my local laptop. I am aware that I can try running on an online dask cluster with larger…

dask xgboost dask-ml

asked Nov 25 '20 at 21:31

Joseph

11
1

1

vote

0 answers

How much memory need for XGBoost model?

Background: Training set with 100m rows and about 50 columns, and i have cast the dtype to the minimum types. still, the dataframe is like 8-10Gb when loaded. Run training on AWS ec2 instances(one is 36CPU + 72RAM. another is 16CPU +…

python pandas out-of-memory xgboost dask-ml

asked Nov 25 '20 at 02:01

Argos.LEE

139
2
6

Questions tagged [dask-ml]