Questions tagged [dask-ml]
79 questions
0
votes
1 answer
dask-ml preprocessing raise AttributeError
I use Dask dataframe and dask-ml to manipulate my data. When I use dask-ml Min-max scaler, I get this error. Is there a way to prevent this error and make it work?
import dask.dataframe as dd
from dask_ml.preprocessing import MinMaxScaler
df =…

Saeide
- 149
- 2
- 14
0
votes
1 answer
How to convert multiple 2D arrays to 1D columns using xarray and dask in python?
I have 7 2D cloud optimised geotiffs stacked into one data array in xarray. They are very large so I am using the intake-xarray extension and dask for streaming the data from s3 without using any RAM. I have concatenated them along their "band"…

wmitchell93
- 41
- 7
0
votes
1 answer
'DataFrame' object has no attribute 'to_delayed'?
I am using randomforest model from scikit learn and BlockwisevottingRegressor from dask.
Code:
Error:

Professor
- 87
- 6
0
votes
0 answers
size of labels must equal number of rows error
I am attempting to fit a model using xgboost in dask.
Here is the code I'm using to fit the model:
clf = xgboost.dask.DaskXGBClassifier(**params)
clf.client = client
clf.fit(X_train, Y_train)
Modeled after this example in the documentation:…

bad_at_dask
- 1
- 1
0
votes
0 answers
Kernel restarts when training a sklearn regression model in Sagemaker
I have been trying to train a regression model, with big data on AWS Sagemaker.
The instance I used on my last try was ml.m5.12xlarge and I was confident it will work this time, but no. I still get the error.
After some minutes in the training I get…

Alejandro
- 119
- 7
0
votes
1 answer
Dask-Error: Could not serialize object of type tuple
I am trying to run models on genomic data using Dask. But, I am getting an error, when I standardize or process the data.
I am working on a SLURM-Cluster. Therefore, first I am starting a cluster:
cluster = SLURMCluster(
cores=16,
…

Christine
- 53
- 8
0
votes
1 answer
How to load a huge model on Dask with limited RAM?
I want to load a model (ANNOY model) on Dask. The size of the model is 60GB and Dask RAM is 2GB only. Is there a way to load the model in distributed manner as well?

n0obcoder
- 649
- 8
- 24
0
votes
1 answer
How i can run GridSearchCV in dast_ml despite this error?
This is my code in Google Colab:
import cupy as cp
import numpy as np
import joblib
import dask_ml.model_selection as dcv
def ParamSelection(X, Y, nfolds):
param_grid = {'C': [0.001, 0.01, 0.1, 1, 10, 100],'kernel':['linear'], 'gamma':[0.001,…

Roman Micuda
- 1
- 2
0
votes
1 answer
How to connect to oralce database and import the data into csv format using dask?
How can I connect to oracle database using dask and fetch the data from it and create a csv file using the fetched data.

Hemanth Kumar
- 1
- 4
0
votes
0 answers
Dask-ml ParallelPostFit not using distributed and causing memory error on local machine
I want to do Random Forest predictions on a large dataset and save the result as an dataframe. I read https://examples.dask.org/machine-learning/parallel-prediction.html and it says "Workers can write the predicted values to a shared file system,…

Wacken0013
- 11
- 3
0
votes
1 answer
How can I use dask_ml preprocessing in a dask distributed cluster
How can I do dask_ml preprocessing in a dask distributed cluster? My dataset is about 200GB and Every time I categorize the dataset preparing for OneHotEncoding, it looks like dask is ignoring the client and try to load the dataset in the local…

wml
- 1
0
votes
1 answer
Out of memory error with Dask and cudf loop
I am using Dask and Rapidsai to run an xgboost model on a large (6.9GB) dataset. The hardware is 4x 2080 TIs with 11 GB of memory each. The raw dataset has a few dozen target columns that have been one-hot encoded, so I am trying to run a loop that…

datahappy
- 826
- 2
- 11
- 29
0
votes
1 answer
Why does dask_ml.preprocessing.OrdinalEncoder.transform produce a not ordinally encoded result?
I'm confused with regard to the result of dask_ml.preprocessing.OrdinalEncoder.transform:
from sklearn.preprocessing import OrdinalEncoder
from dask_ml.preprocessing import OrdinalEncoder as DaskOrdinalEncoder
import numpy as np
import pandas as…

Raffael
- 19,547
- 15
- 82
- 160
0
votes
1 answer
Reduce dask XGBoost memory consumption
I am writing a simple script code to train an XGBoost predictor on my dataset.
This is the code I am using:
import dask.dataframe as dd
import dask_ml
from dask.distributed import Client, LocalCluster
import sys
from dask_ml.model_selection import…

Mattia Surricchio
- 1,362
- 2
- 21
- 49
0
votes
1 answer
joblib connection to Dask backend: tornado.iostream.StreamClosedError: Stream is closed
I am running a simple program on my dask worker. Below is the program.
import numpy as np
from dask.distributed import Client
import joblib
from sklearn.datasets import load_digits
from sklearn.model_selection import RandomizedSearchCV
from…

Seeker
- 163
- 1
- 12