Highest Voted 'dask-distributed' Questions

4

votes

1 answer

How to check if dask dataframe is empty if lazily evaluated?

I am aware of this question. But check the code(minimal-working example) below: import dask.dataframe as dd import pandas as pd # intialise data of lists. data = {'Name': ['Tom', 'nick', 'krish', 'jack'], 'Age': [20, 21, 19, 18]} # Create…

python-3.x dask dask-distributed

asked Dec 28 '19 at 13:16

MehmedB

1,059
1
16
42

4

votes

0 answers

dask 100GB dataframe sorting / set_index on new column out of memory issues

I have a dask dataframe of around 100GB and 4 columns that does not fit into memory. My machine is an 8 CORE Xeon with 64GB of Ram with a local Dask Cluster. I converted the dataframe to 150 partiitions (700MB each). However My simple set_index()…

sorting indexing dask dask-distributed

asked Dec 02 '19 at 14:21

user670186

2,588
6
37
55

4

votes

2 answers

How to use Dask on Databricks

I want to use Dask on Databricks. It should be possible (I cannot see why not). If I import it, one of two things happens, either I get an ImportError but when I install distributed to solve this DataBricks just says Cancelled without throwing any…

dask databricks dask-distributed azure-databricks

asked Jun 04 '19 at 12:53

SARose

3,558
5
39
49

4

votes

0 answers

Pycharm debugger throws Bad file descriptor error when using dask distributed

I am using the most lightweight/simple dask multiprocessing which is the non-cluster local Client: from distributed import Client client = Client() Even so: the first instance of invoking dask.bag.compute() results in the following: Connected to…

python pycharm dask dask-distributed

asked May 23 '19 at 02:29

WestCoastProjects

58,982
91
316
560

4

votes

1 answer

Initializing state on dask-distributed workers

I am trying to do something like resource = MyResource() def fn(x): something = dosemthing(x, resource) return something client = Client() results = client.map(fn, data) The issue is that resource is not serializable and is expensive to…

python python-3.x multiprocessing dask dask-distributed

asked Jan 31 '19 at 21:44

Daniel Mahler

7,653
5
51
90

4

votes

1 answer

dask read_csv timeout on Amazon s3 with big files

dask read_csv timeout on s3 for big files s3fs.S3FileSystem.read_timeout = 5184000 # one day s3fs.S3FileSystem.connect_timeout = 5184000 # one day client = Client('a_remote_scheduler_ip_here:8786') df =…

python amazon-web-services amazon-s3 dask dask-distributed

asked Dec 30 '18 at 02:07

Võ Trường Duy

121
1
7

4

votes

1 answer

How do I use dask to efficiently calculate many simple statistics

Problem I want to calculate a bunch of "easy to gather" statistics using Dask. Speed is my primary concern and objective, and so I am looking to throw a wide cluster at the problem. Ideally I would like to finish the described problem in less than…

python pandas distributed dask dask-distributed

asked Dec 19 '18 at 03:29

bluecoconut

63
1
5

4

votes

0 answers

Dask Distributed client takes to long to initialize in jupyter lab

Trying to initialize a client with local cluster in Jupyter lab but hangs. This behaviour happens for python 3.5 and jupyter lab 0.35. import dask.dataframe as dd from dask import delayed from distributed import Client from distributed import…

python-3.x dask dask-distributed jupyter-lab

asked Nov 02 '18 at 14:03

Apostolos

7,763
17
80
150

4

votes

1 answer

Tornado unexpected exception in Future after timeout

I have set up a dask cluster. I can access a web dashboard, but when I'm trying to connect to the scheduler: from dask.distributed import Client client = Client('192.168.0.10:8786') I get the following error: tornado.application - ERROR - Exception…

python-3.x tornado dask concurrent.futures dask-distributed

asked Sep 13 '18 at 18:04

Vladyslav Moisieienkov

4,118
4
25
32

4

votes

1 answer

How to assign tasks to specific worker within Dask.Distributed

I am interesting in using Dask Distributed as task executor. In Celery it is possible to assign task to specific worker. How is it possible using Dask Distributed?

dask-distributed

asked Jul 23 '18 at 13:03

Sklavit

2,225
23
29

4

votes

1 answer

dask how to define a custom (time fold) function that operates in parallel and returns a dataframe with a different shape

I am trying to implement a time fold function to be 'map'ed to various partitions of a dask dataframe which in turn changes the shape of the dataframe in question (or alternatively produces a new dataframe with the altered shape). This is how far I…

pandas dataframe dask dask-distributed dask-delayed

asked Jun 28 '18 at 09:14

PhaKuDi

141
8

4

votes

1 answer

Running shell commands in parallel using dask distributed

I have a folder with a lot of .sh scripts. How can I use an already set up dask distributed cluster to run them in parallel? Currently, I am doing the following: import dask, distributed, os # list with shell commands that I want to run commands =…

python dask dask-distributed

asked Mar 29 '18 at 08:53

Arco Bast

3,595
2
26
53

4

votes

0 answers

Optimal approach to create dask dataframe from parquet files(HDFS) in different directories

I am trying to create dask dataframe from large number of parquet files stored different HDFS directories. I have tried two approaches but both of them seems to take very long time. Approach 1: call api read_parquet with glob path.…

dask dask-distributed fastparquet

asked Mar 22 '18 at 07:21

Santosh Kumar

761
5
28

4

votes

0 answers

Dask restart worker(s) using client

Is there a way using dask client to restart a worker or worker list provided. Needed a way to bounce a worker after a task is executed to reset the state of the process which may have been changed by the execution. Client.restart() restarts entire…

dask dask-distributed

asked Mar 09 '18 at 16:23

Ameet Shah

61
1
4

4

votes

1 answer

Redistribute dask tasks among the cluster

I am abusing dask as a task scheduler for long running tasks with map(, pure=False). So I am not interested in the dask graph, I just use dark as a way to distribute unix commands. Lets say if have 1000 tasks and they run for a week on a cluster of…

dask dask-distributed

asked Feb 06 '18 at 10:42

MaxBenChrist

547
3
9

Questions tagged [dask-distributed]