Questions tagged [dask-ml]

79 questions
0
votes
1 answer

dask-ml preprocessing raise AttributeError

I use Dask dataframe and dask-ml to manipulate my data. When I use dask-ml Min-max scaler, I get this error. Is there a way to prevent this error and make it work? import dask.dataframe as dd from dask_ml.preprocessing import MinMaxScaler df =…
Saeide
  • 149
  • 2
  • 14
0
votes
1 answer

How to convert multiple 2D arrays to 1D columns using xarray and dask in python?

I have 7 2D cloud optimised geotiffs stacked into one data array in xarray. They are very large so I am using the intake-xarray extension and dask for streaming the data from s3 without using any RAM. I have concatenated them along their "band"…
0
votes
1 answer

'DataFrame' object has no attribute 'to_delayed'?

I am using randomforest model from scikit learn and BlockwisevottingRegressor from dask. Code: Error:
0
votes
0 answers

size of labels must equal number of rows error

I am attempting to fit a model using xgboost in dask. Here is the code I'm using to fit the model: clf = xgboost.dask.DaskXGBClassifier(**params) clf.client = client clf.fit(X_train, Y_train) Modeled after this example in the documentation:…
0
votes
0 answers

Kernel restarts when training a sklearn regression model in Sagemaker

I have been trying to train a regression model, with big data on AWS Sagemaker. The instance I used on my last try was ml.m5.12xlarge and I was confident it will work this time, but no. I still get the error. After some minutes in the training I get…
Alejandro
  • 119
  • 7
0
votes
1 answer

Dask-Error: Could not serialize object of type tuple

I am trying to run models on genomic data using Dask. But, I am getting an error, when I standardize or process the data. I am working on a SLURM-Cluster. Therefore, first I am starting a cluster: cluster = SLURMCluster( cores=16, …
Christine
  • 53
  • 8
0
votes
1 answer

How to load a huge model on Dask with limited RAM?

I want to load a model (ANNOY model) on Dask. The size of the model is 60GB and Dask RAM is 2GB only. Is there a way to load the model in distributed manner as well?
n0obcoder
  • 649
  • 8
  • 24
0
votes
1 answer

How i can run GridSearchCV in dast_ml despite this error?

This is my code in Google Colab: import cupy as cp import numpy as np import joblib import dask_ml.model_selection as dcv def ParamSelection(X, Y, nfolds): param_grid = {'C': [0.001, 0.01, 0.1, 1, 10, 100],'kernel':['linear'], 'gamma':[0.001,…
0
votes
1 answer

How to connect to oralce database and import the data into csv format using dask?

How can I connect to oracle database using dask and fetch the data from it and create a csv file using the fetched data.
0
votes
0 answers

Dask-ml ParallelPostFit not using distributed and causing memory error on local machine

I want to do Random Forest predictions on a large dataset and save the result as an dataframe. I read https://examples.dask.org/machine-learning/parallel-prediction.html and it says "Workers can write the predicted values to a shared file system,…
Wacken0013
  • 11
  • 3
0
votes
1 answer

How can I use dask_ml preprocessing in a dask distributed cluster

How can I do dask_ml preprocessing in a dask distributed cluster? My dataset is about 200GB and Every time I categorize the dataset preparing for OneHotEncoding, it looks like dask is ignoring the client and try to load the dataset in the local…
0
votes
1 answer

Out of memory error with Dask and cudf loop

I am using Dask and Rapidsai to run an xgboost model on a large (6.9GB) dataset. The hardware is 4x 2080 TIs with 11 GB of memory each. The raw dataset has a few dozen target columns that have been one-hot encoded, so I am trying to run a loop that…
datahappy
  • 826
  • 2
  • 11
  • 29
0
votes
1 answer

Why does dask_ml.preprocessing.OrdinalEncoder.transform produce a not ordinally encoded result?

I'm confused with regard to the result of dask_ml.preprocessing.OrdinalEncoder.transform: from sklearn.preprocessing import OrdinalEncoder from dask_ml.preprocessing import OrdinalEncoder as DaskOrdinalEncoder import numpy as np import pandas as…
Raffael
  • 19,547
  • 15
  • 82
  • 160
0
votes
1 answer

Reduce dask XGBoost memory consumption

I am writing a simple script code to train an XGBoost predictor on my dataset. This is the code I am using: import dask.dataframe as dd import dask_ml from dask.distributed import Client, LocalCluster import sys from dask_ml.model_selection import…
Mattia Surricchio
  • 1,362
  • 2
  • 21
  • 49
0
votes
1 answer

joblib connection to Dask backend: tornado.iostream.StreamClosedError: Stream is closed

I am running a simple program on my dask worker. Below is the program. import numpy as np from dask.distributed import Client import joblib from sklearn.datasets import load_digits from sklearn.model_selection import RandomizedSearchCV from…
Seeker
  • 163
  • 1
  • 12