Highest Voted 'dask-distributed' Questions

0

votes

0 answers

Dask distributed with numba giving error

I am trying to implement numba with dask using a simple groupby operation on a dataset.It is working fine on a single system but as I move ahead to apply it on a distributed one ,it is giving error which I am unable to get through.Please help.Thanks…

asked Aug 22 '18 at 11:18

Sweta

63
3
13

0

votes

1 answer

Error: No module name 'Custom Class' while passing a Client object in the custom class's constructor in dask

I have been trying to write custom classes for Preprocessing followed by Feature selection and Machine Learning algorithms as well. I cracked this (preprocessing only) using @delayed. But when I read from the tutorials that the same can be achieved…

python dask dask-distributed dask-delayed

asked Aug 10 '18 at 15:28

Asif Ali

1,422
2
12
28

0

votes

1 answer

Dask client runs out of memory loading from S3

I have a s3 bucket with a lot of small files, over 100K that add up to about 700GB. When loading the objects from a data bag and then persist the client always runs out of memory, consuming gigs very quickly. Limiting the scope to a few hundred…

dask dask-distributed

asked Aug 07 '18 at 02:56

Kevin McGrath

146
1
5

0

votes

1 answer

dask-jobqueue does not start any worker on slurm cluster

I am trying to run dask on a research cluster managed by slurm. Launching a job with a classical sbatch script is working. But when I am doing: from dask_jobqueue import SLURMCluster cluster = SLURMCluster(cores=12, memory='24 GB', processes=1,…

dask slurm dask-distributed

asked Aug 01 '18 at 17:41

LCT

233
1
7

0

votes

1 answer

How to implement `iloc` function for dask dataframe?

I have a huge file, around 35GB stored in form of hdf5. I have to do certain calculations on some specific columns and want to insert those calculations as new columns. I know I can assign new columns directly as df['new_column'] = 0(or some other…

python pandas data-science dask dask-distributed

asked Aug 01 '18 at 09:43

Urvish

643
3
10
19

0

votes

1 answer

dask distributed: adding up a collection of vectors residing on different workers

I have a large set of vectors that were computed on different data, thus they reside on different workers. Is the following code the most efficient? grads = [client.submit(compute_grad, x) for x in xs] # list of futures gradsum_future =…

dask dask-distributed

asked Jul 28 '18 at 21:55

John

935
6
17

0

votes

1 answer

difference between client and executor in dask

Executor is the primary entry point for users of distributed.Similarly, Client is the primary entry point for users of dask.distributed. So, both seem like identical. In dask, can both be used interchangeably ? If yes,what is the use case to use…

client dask executor dask-distributed

asked Jul 26 '18 at 08:56

Sweta

63
3
13

0

votes

1 answer

compute() in dask not working

I am trying a simple parallel computation in Dask. This is my code. import time import dask as dask import dask.distributed as distributed import dask.dataframe as dd import dask.delayed as delayed from dask.distributed import…

dataframe dask dask-distributed

asked Jul 19 '18 at 12:58

Sweta

63
3
13

0

votes

1 answer

Parallelization on cluster dask

I'm looking for the best way to parallelize on a cluster the following problem. I have several files folder/file001.csv folder/file002.csv : folder/file100.csv They are disjoints with respect to the key I want to use to groupby, that is if a set…

dask dask-distributed

asked Jul 18 '18 at 13:04

rpanai

12,515
2
42
64

0

votes

1 answer

processes =false in local distribution in dask

I read the documentation of DASK . It is written there in local distributed form that client = Client(processes=False) I would like to know why is the processes mentioned as false ?

macos python-2.7 dask dask-distributed

asked Jul 11 '18 at 12:45

Sweta

63
3
13

0

votes

1 answer

How is dask implemented on multiple systems?

I am new to Dask library.I wanted to know if we implement parallel computation using dask on two systems ,then is the data frame on which we apply the computation stored on both the systems ? How actually does the parallel computation takes place,it…

python-2.7 parallel-processing dask dask-distributed

asked Jul 03 '18 at 13:11

Sweta

63
3
13

0

votes

0 answers

Custom search in Dask

I have 1000 regex patterns which I have to search in each of the 9000 strings. Normal brute force method using pandas list took 25 min for the same task. I have used delayed function of dask to parallelize the entire function. It took 9 min to…

python dask dask-distributed dask-delayed

asked Jul 03 '18 at 09:47

ANKIT JHA

359
1
3
9

0

votes

2 answers

Confusion regarding cluster scheduler and single machine distributed scheduler

In below code, why dd.read_csv is running on cluster? client.read_csv should run on cluster. import dask.dataframe as dd from dask.distributed import Client client=Client('10.31.32.34:8786') dd.read_csv('file.csv',blocksize=10e7) dd.compute() Is…

dask dask-distributed

asked Jun 28 '18 at 11:30

Dhruv Kumar

399
2
13

0

votes

1 answer

Another UI for Dask except bokeh

Isn't there another Dask UI except for bokeh? I have a problem with bokeh, as it is not showing the graph and UI when running in an ec2 instance.

dask dask-distributed

asked Jun 27 '18 at 07:25

Dhruv Kumar

399
2
13

0

votes

0 answers

AttributeError: 'S3File' object has not attribute 'getvalue', while running to_csv

I'm running to_csv command as follows to an ouput file on a s3 bucket with ServerSideEncryption enabled: to_csv("s3://mys3bucket/result.csv", storage_option={'s3_additional_kwargs': {'ServerSideEncryption': 'AES256'}}) I'm getting…

amazon-s3 dask dask-distributed

asked Jun 26 '18 at 04:46

Dhruv Kumar

399
2
13

Questions tagged [dask-distributed]