Highest Voted 'dask-ml' Questions

0

votes

1 answer

dask-ml preprocessing raise AttributeError

I use Dask dataframe and dask-ml to manipulate my data. When I use dask-ml Min-max scaler, I get this error. Is there a way to prevent this error and make it work? import dask.dataframe as dd from dask_ml.preprocessing import MinMaxScaler df =…

asked Nov 06 '22 at 18:13

Saeide

149
2
14

0

votes

1 answer

How to convert multiple 2D arrays to 1D columns using xarray and dask in python?

I have 7 2D cloud optimised geotiffs stacked into one data array in xarray. They are very large so I am using the intake-xarray extension and dask for streaming the data from s3 without using any RAM. I have concatenated them along their "band"…

python dask python-xarray dask-ml

asked Jul 26 '22 at 09:43

wmitchell93

41
7

0

votes

1 answer

'DataFrame' object has no attribute 'to_delayed'?

I am using randomforest model from scikit learn and BlockwisevottingRegressor from dask. Code: Error:

dask dask-distributed dask-dataframe dask-delayed dask-ml

asked Jul 06 '22 at 09:52

Professor

87
6

0

votes

0 answers

size of labels must equal number of rows error

I am attempting to fit a model using xgboost in dask. Here is the code I'm using to fit the model: clf = xgboost.dask.DaskXGBClassifier(**params) clf.client = client clf.fit(X_train, Y_train) Modeled after this example in the documentation:…

python dask xgboost dask-ml

asked Jul 06 '22 at 05:24

bad_at_dask

1
1

0

votes

0 answers

Kernel restarts when training a sklearn regression model in Sagemaker

I have been trying to train a regression model, with big data on AWS Sagemaker. The instance I used on my last try was ml.m5.12xlarge and I was confident it will work this time, but no. I still get the error. After some minutes in the training I get…

scikit-learn regression amazon-sagemaker dask-ml

asked Jun 26 '22 at 07:10

Alejandro

119
7

0

votes

1 answer

Dask-Error: Could not serialize object of type tuple

I am trying to run models on genomic data using Dask. But, I am getting an error, when I standardize or process the data. I am working on a SLURM-Cluster. Therefore, first I am starting a cluster: cluster = SLURMCluster( cores=16, …

python dask python-xarray slurm dask-ml

asked May 14 '22 at 00:45

Christine

53
8

0

votes

1 answer

How to load a huge model on Dask with limited RAM?

I want to load a model (ANNOY model) on Dask. The size of the model is 60GB and Dask RAM is 2GB only. Is there a way to load the model in distributed manner as well?

dask dask-distributed dask-ml annoy

asked Feb 11 '22 at 06:38

n0obcoder

649
8
24

0

votes

1 answer

How i can run GridSearchCV in dast_ml despite this error?

This is my code in Google Colab: import cupy as cp import numpy as np import joblib import dask_ml.model_selection as dcv def ParamSelection(X, Y, nfolds): param_grid = {'C': [0.001, 0.01, 0.1, 1, 10, 100],'kernel':['linear'], 'gamma':[0.001,…

gridsearchcv rapids dask-ml

asked Oct 31 '21 at 20:11

Roman Micuda

1
2

0

votes

1 answer

How to connect to oralce database and import the data into csv format using dask?

How can I connect to oracle database using dask and fetch the data from it and create a csv file using the fetched data.

dask dask-distributed dask-delayed dask-dataframe dask-ml

asked Sep 29 '21 at 11:15

Hemanth Kumar

1
4

0

votes

0 answers

Dask-ml ParallelPostFit not using distributed and causing memory error on local machine

I want to do Random Forest predictions on a large dataset and save the result as an dataframe. I read https://examples.dask.org/machine-learning/parallel-prediction.html and it says "Workers can write the predicted values to a shared file system,…

dask dask-distributed dask-dataframe dask-ml

asked Sep 15 '21 at 08:23

Wacken0013

11
3

0

votes

1 answer

How can I use dask_ml preprocessing in a dask distributed cluster

How can I do dask_ml preprocessing in a dask distributed cluster? My dataset is about 200GB and Every time I categorize the dataset preparing for OneHotEncoding, it looks like dask is ignoring the client and try to load the dataset in the local…

dask dask-distributed dask-delayed dask-dataframe dask-ml

asked Jul 09 '21 at 15:19

wml

1

0

votes

1 answer

Out of memory error with Dask and cudf loop

I am using Dask and Rapidsai to run an xgboost model on a large (6.9GB) dataset. The hardware is 4x 2080 TIs with 11 GB of memory each. The raw dataset has a few dozen target columns that have been one-hot encoded, so I am trying to run a loop that…

python dask rapids cudf dask-ml

asked Jun 08 '21 at 16:01

datahappy

826
2
11
29

0

votes

1 answer

Why does dask_ml.preprocessing.OrdinalEncoder.transform produce a not ordinally encoded result?

I'm confused with regard to the result of dask_ml.preprocessing.OrdinalEncoder.transform: from sklearn.preprocessing import OrdinalEncoder from dask_ml.preprocessing import OrdinalEncoder as DaskOrdinalEncoder import numpy as np import pandas as…

dask dask-dataframe dask-ml

asked May 07 '21 at 11:12

Raffael

19,547
15
82
160

0

votes

1 answer

Reduce dask XGBoost memory consumption

I am writing a simple script code to train an XGBoost predictor on my dataset. This is the code I am using: import dask.dataframe as dd import dask_ml from dask.distributed import Client, LocalCluster import sys from dask_ml.model_selection import…

python dask xgboost dask-distributed dask-ml

asked May 01 '21 at 10:41

Mattia Surricchio

1,362
2
21
49

0

votes

1 answer

joblib connection to Dask backend: tornado.iostream.StreamClosedError: Stream is closed

I am running a simple program on my dask worker. Below is the program. import numpy as np from dask.distributed import Client import joblib from sklearn.datasets import load_digits from sklearn.model_selection import RandomizedSearchCV from…

python-3.x tornado dask-distributed joblib dask-ml

asked Mar 23 '21 at 12:11

Seeker

163
1
12

Questions tagged [dask-ml]