Questions tagged [dask-ml]
79 questions
0
votes
1 answer
BUG: Dask K-means Exception heppen Too many indices for array
I am using K-means clustering on a dataset with shape (563, 207383) via Dask-K-means (CPU based), and am getting the following error:
"Dask K-means Exception heppen Too many indices for array"
But when I use RapidsAI dask_k-means (GPU Based) it…

Vivek kala
- 23
- 3
0
votes
1 answer
Dask-Rapids data movment and out of memory issue
I am using dask (2021.3.0) and rapids(0.18) in my project. In this, I am performing preprocessing task on the CPU, and later the preprocessed data is transferred to GPU for K-means clustering. But in this process, I am getting the following…

Vivek kala
- 23
- 3
0
votes
1 answer
dask_ml Simple Imputer fails with AttributeError: 'DataFrame' object has no attribute '_data'
I am reading a csv into Dask Dataframe and then calling SimpleImputer from dask_ml library.
I am facing two different kinds of issues.
Issue 1) Simple Imputer on Dask fails with FileNotFound when in reality i am able to read the columns.
code:
…

Seeker
- 163
- 1
- 12
0
votes
0 answers
MySQL server : connection using dask
I have a dataframe which has million of records and while pulling the dataframe in jupyter it takes lot of memory and I am unable to do so as the server get's crashed because there are million's of records in database.
I got to know about DASK…

Dexter1611
- 492
- 1
- 4
- 15
0
votes
1 answer
Create a category-code map based off a Dask.Series
I have a Dask.Series with a categorical dtype that is known. I want to create a little dataframe which shows the associated mapping without having to compute the entire series. How do I achieve this?
import pandas as pd
import dask.dataframe as…

WolVes
- 1,286
- 2
- 19
- 39
0
votes
0 answers
How to use xgboost in dask?
I was trying to use dask for kaggle fraud detection classification problem.
But, when I build the model, model predicts all the values as 1.
I am truly surprised, since there are 56,000 zeors and 92 ones in test data, still the model somehow…

BhishanPoudel
- 15,974
- 21
- 108
- 169
0
votes
0 answers
DaskML with XGBoost and using eval_set requires pre-computed data
I am trying to run dask_ml.xgboost using eval_set to allow for early stopping in an attempt to avoid overfitting.
Currently, I have a sample dataset shown in the example below
from dask.distributed import Client
from dask_ml.datasets import…

edesz
- 11,756
- 22
- 75
- 123
0
votes
1 answer
Nested processes with Dask and Machine learning models
I have a dataset consisting of 100000 samples.
I need to split this dataset into 100 subsets and for each subset train a ML model.
Since the trained models are independent, it's easy to parallelize this part doing something like
from dask import…

gioxc88
- 349
- 1
- 9
0
votes
1 answer
Error while importing DASK: module 'dask.array' has no attribute 'blockwise'
I am trying to use DASK for fast computing as logistic regression aborted after 17 hours on my system.
My data set is about 1 million rows.
I first ran these commands:
import dask.array as da
import dask.dataframe as dd
from dask.distributed import…

Sakshi Jajodia
- 101
- 8
0
votes
1 answer
dusk ml logisticregression throws this error: "NotImplementedError: Can not add intercept to array with unknown chunk shape"
Hello i am new to dusk Ml, i have been trying to use dask ml to train a logistic regression model to predict tweet sentiment. I have converted a pandas dataframe to a dask dataframe. After that i performed train test split. After that i used hashing…

Sabbir Talukdar
- 115
- 2
- 11
0
votes
1 answer
Dask ML won't connect to remote cluster
I've connected to my remote cluster via Client, now I'm trying to use Dask-ml
from sklearn.ensemble import RandomForestClassifier
from sklearn.externals import joblib
#import dask_ml.joblib
clf = RandomForestClassifier(n_estimators=200,…

kaysuez
- 47
- 1
- 7
0
votes
1 answer
Can you use dask_ml kmeans on a dask array?
I have the following code:
feature_array = da.concatenate(features, axis=1)#.compute()
model = KMeans(n_clusters=4)
model.fit(features, y=None)
Now if I compute feature_array first this code runs just fine, but without it it gives some internal…

FlorianEn
- 110
- 9
0
votes
2 answers
Dask Distributed Client - ERROR - Failed to reconnect to scheduler after 10.00 seconds, closing client
I am doing a very simple data transformation with Dask_ML and I am getting this error, I was wondering if anyone has encountered this. Looks like a system setting that can be modified?
df.head()
ReportDate_Time 2015-05-01 2015-06-01 2015-07-01 …

Odisseo
- 747
- 1
- 13
- 32
0
votes
1 answer
How to create a dask-array from CuPy array?
I'm trying to launch dask.cluster.Kmeans with the huge amount of data.
Working with CPU is OK since i wrap numpy arrays with dask.array.
Working with GPU doesn't seem to be possible due to not implemented functionalities in cupy.
I've tried to…

Rostislav Povelikin
- 61
- 7
0
votes
0 answers
How to use Apriori Algorithm in Dask Dataframe?
I want to use Apriori Algorithm for my dataset to find the associated products. But the data I have has 14 million records, so I can't use it directly with MLEXTEND. I have loaded the data into Dask data frame.
Could anyone help me solve this…

MAHESH DIVAKARAN
- 19
- 4