Highest Voted 'dask-distributed' Questions

5

votes

1 answer

Why am I getting dask warnings when running a pandas operation?

I have a notebook with both pandas and dask operations. When I have not started the client, everything is as expected. But once I start the dask.distributed client, I get warnings in cells where I'm running pandas operations e.g.…

dask dask-distributed

asked Mar 10 '18 at 07:08

birdsarah

1,165
8
20

5

votes

2 answers

Override dask scheduler to concurrently load data on multiple workers

I want to run graphs/futures on my distributed cluster which all have a 'load data' root task and then a bunch of training tasks that run on that data. A simplified version would look like this: from dask.distributed import Client client =…

dask dask-distributed

asked Jan 17 '18 at 10:54

user8871302

123
7

5

votes

2 answers

Progress reporting on dask's set_index

I am trying to wrap a progress indicator around the entire script. However, set_index(..., compute=False) does still run tasks on the scheduler, observable in the web interface. How do I report on the progress of the set_index step? import…

dask dask-distributed

asked Oct 25 '17 at 06:06

kadrach

408
6
11

5

votes

1 answer

Safe & performant way to modify dask dataframe

As a part of data workflow I need to modify values in a subset of dask dataframe columns and pass the results for further computation. In particular, I'm interested in 2 cases: mapping columns and mapping partitions. What is the recommended safe &…

dask dask-distributed

asked Sep 05 '17 at 10:24

evilkonrex

255
2
10

5

votes

1 answer

Dask Distributed Unable to locate credentials

I can't access my files on S3 using a dataframe read : df_read_csv. I get the error: Exception: Unable to locate credentials This works fine when my dask distributed is running against local worker cores. However, when I import a client with a…

ubuntu amazon-s3 dask-distributed

asked Aug 09 '17 at 09:09

Matt Barnes

51
3

4

votes

1 answer

Setting maximum number of workers in Dask map function

I have a Dask process that triggers 100 workers with a map function: worker_args = .... # array with 100 elements with worker parameters futures = client.map(function_in_worker, worker_args) worker_responses = client.gather(futures) I use docker…

python dask dask-distributed dask-dataframe dask-delayed

asked Nov 03 '22 at 14:06

ps0604

1,227
23
133
330

4

votes

1 answer

Dask multi-stage resource setup causes Failed to Serialize Error

Using the exact code from Dask's documentation at https://jobqueue.dask.org/en/latest/examples.html In case the page changes, this is the code: from dask_jobqueue import SLURMCluster from distributed import Client from dask import delayed cluster =…

python python-3.x dask dask-distributed dask-delayed

asked Jun 10 '22 at 17:24

michaelgbj

290
1
10

4

votes

2 answers

Running two Tensorflow trainings in parallel using joblib and dask

I have the following code that runs two TensorFlow trainings in parallel using Dask workers implemented in Docker containers. I need to launch two processes, using the same dask client, where each will train their respective models with N…

python tensorflow dask dask-distributed joblib

asked Dec 23 '21 at 16:46

ps0604

1,227
23
133
330

4

votes

1 answer

Dask: handling unresponsive workers

When using Dask with SGE or PBS clusters I sometimes have workers becoming unresponsive. These workers are highlighted in red in the dashboard Info section with their "Last seen" number constantly increasing. I know this can happen if submitted…

python dask dask-distributed

asked Feb 16 '21 at 09:13

Thomas

81
7

4

votes

1 answer

Dask aws cluster error when initializing: User data is limited to 16384 bytes

I'm following the guide here: https://cloudprovider.dask.org/en/latest/packer.html#ec2cluster-with-rapids In particular I set up my instance with packer, and am now trying to run the final piece of code: cluster = EC2Cluster( …

amazon-web-services amazon-ec2 conda dask dask-distributed

asked Jan 31 '21 at 17:56

ZirconCode

805
2
10
24

4

votes

1 answer

Dask crashing when saving to file?

I'm trying to take onehot encode a dataset then groupby a specific column so I can get one row for each item in that column with a aggregated view of what onehot columns are true for that specific row. It seems to be working on small data and using…

python pandas dask dask-distributed dask-dataframe

asked Dec 30 '20 at 01:07

Lostsoul

25,013
48
144
239

4

votes

1 answer

Is there a way of using dask jobqueue over ssh

Dask jobqueue seems to be a very nice solution for distributing jobs to PBS/Slurm managed clusters. However, if I'm understanding its use correctly, you must create instance of "PBSCluster/SLURMCluster" on head/login node. Then you can on the same…

ssh cluster-computing dask dask-distributed dask-jobqueue

asked Dec 22 '20 at 00:13

Phil Reinhold

141
1

4

votes

1 answer

Avoiding memory overflow while using xarray dask apply_ufunc

I need to apply a function along the time dimension of an xarray dask array of this shape: dask.array

python-3.6 dask python-xarray dask-distributed numpy-ufunc

asked Oct 30 '20 at 21:27

Monobakht

193
1
11

4

votes

1 answer

Timeout OSError while running dask on local cluster

I am trying to run the following code on a Power PC with config: Operating System: Red Hat Enterprise Linux Server 7.6 (Maipo) CPE OS Name: cpe:/o:redhat:enterprise_linux:7.6:GA:server Kernel: Linux 3.10.0-957.21.3.el7.ppc64le …

python python-3.x dask dask-distributed

asked May 15 '20 at 16:21

Coddy

549
4
18

4

votes

1 answer

dask.distributed SLURM cluster Nanny Timeout

I am trying to use the dask.distributed.SLURMCluster to submit batch jobs to a SLURM job scheduler on a supercomputing cluster. The jobs all submit as expect, but throw an error after 1 minute of running: asyncio.exceptions.TimeoutError: Nanny…

dask hpc slurm dask-distributed

asked Mar 04 '20 at 23:14

Ovec8hkin

65
1
6

Prev 1 2 3

…

72 73 Next

Questions tagged [dask-distributed]