Questions tagged [dask-ml]

79 questions
0
votes
1 answer

BUG: Dask K-means Exception heppen Too many indices for array

I am using K-means clustering on a dataset with shape (563, 207383) via Dask-K-means (CPU based), and am getting the following error: "Dask K-means Exception heppen Too many indices for array" But when I use RapidsAI dask_k-means (GPU Based) it…
Vivek kala
  • 23
  • 3
0
votes
1 answer

Dask-Rapids data movment and out of memory issue

I am using dask (2021.3.0) and rapids(0.18) in my project. In this, I am performing preprocessing task on the CPU, and later the preprocessed data is transferred to GPU for K-means clustering. But in this process, I am getting the following…
Vivek kala
  • 23
  • 3
0
votes
1 answer

dask_ml Simple Imputer fails with AttributeError: 'DataFrame' object has no attribute '_data'

I am reading a csv into Dask Dataframe and then calling SimpleImputer from dask_ml library. I am facing two different kinds of issues. Issue 1) Simple Imputer on Dask fails with FileNotFound when in reality i am able to read the columns. code: …
Seeker
  • 163
  • 1
  • 12
0
votes
0 answers

MySQL server : connection using dask

I have a dataframe which has million of records and while pulling the dataframe in jupyter it takes lot of memory and I am unable to do so as the server get's crashed because there are million's of records in database. I got to know about DASK…
Dexter1611
  • 492
  • 1
  • 4
  • 15
0
votes
1 answer

Create a category-code map based off a Dask.Series

I have a Dask.Series with a categorical dtype that is known. I want to create a little dataframe which shows the associated mapping without having to compute the entire series. How do I achieve this? import pandas as pd import dask.dataframe as…
WolVes
  • 1,286
  • 2
  • 19
  • 39
0
votes
0 answers

How to use xgboost in dask?

I was trying to use dask for kaggle fraud detection classification problem. But, when I build the model, model predicts all the values as 1. I am truly surprised, since there are 56,000 zeors and 92 ones in test data, still the model somehow…
BhishanPoudel
  • 15,974
  • 21
  • 108
  • 169
0
votes
0 answers

DaskML with XGBoost and using eval_set requires pre-computed data

I am trying to run dask_ml.xgboost using eval_set to allow for early stopping in an attempt to avoid overfitting. Currently, I have a sample dataset shown in the example below from dask.distributed import Client from dask_ml.datasets import…
edesz
  • 11,756
  • 22
  • 75
  • 123
0
votes
1 answer

Nested processes with Dask and Machine learning models

I have a dataset consisting of 100000 samples. I need to split this dataset into 100 subsets and for each subset train a ML model. Since the trained models are independent, it's easy to parallelize this part doing something like from dask import…
gioxc88
  • 349
  • 1
  • 9
0
votes
1 answer

Error while importing DASK: module 'dask.array' has no attribute 'blockwise'

I am trying to use DASK for fast computing as logistic regression aborted after 17 hours on my system. My data set is about 1 million rows. I first ran these commands: import dask.array as da import dask.dataframe as dd from dask.distributed import…
0
votes
1 answer

dusk ml logisticregression throws this error: "NotImplementedError: Can not add intercept to array with unknown chunk shape"

Hello i am new to dusk Ml, i have been trying to use dask ml to train a logistic regression model to predict tweet sentiment. I have converted a pandas dataframe to a dask dataframe. After that i performed train test split. After that i used hashing…
Sabbir Talukdar
  • 115
  • 2
  • 11
0
votes
1 answer

Dask ML won't connect to remote cluster

I've connected to my remote cluster via Client, now I'm trying to use Dask-ml from sklearn.ensemble import RandomForestClassifier from sklearn.externals import joblib #import dask_ml.joblib clf = RandomForestClassifier(n_estimators=200,…
kaysuez
  • 47
  • 1
  • 7
0
votes
1 answer

Can you use dask_ml kmeans on a dask array?

I have the following code: feature_array = da.concatenate(features, axis=1)#.compute() model = KMeans(n_clusters=4) model.fit(features, y=None) Now if I compute feature_array first this code runs just fine, but without it it gives some internal…
FlorianEn
  • 110
  • 9
0
votes
2 answers

Dask Distributed Client - ERROR - Failed to reconnect to scheduler after 10.00 seconds, closing client

I am doing a very simple data transformation with Dask_ML and I am getting this error, I was wondering if anyone has encountered this. Looks like a system setting that can be modified? df.head() ReportDate_Time 2015-05-01 2015-06-01 2015-07-01 …
Odisseo
  • 747
  • 1
  • 13
  • 32
0
votes
1 answer

How to create a dask-array from CuPy array?

I'm trying to launch dask.cluster.Kmeans with the huge amount of data. Working with CPU is OK since i wrap numpy arrays with dask.array. Working with GPU doesn't seem to be possible due to not implemented functionalities in cupy. I've tried to…
0
votes
0 answers

How to use Apriori Algorithm in Dask Dataframe?

I want to use Apriori Algorithm for my dataset to find the associated products. But the data I have has 14 million records, so I can't use it directly with MLEXTEND. I have loaded the data into Dask data frame. Could anyone help me solve this…