Dask.distributed is a lightweight library for distributed computing in Python. It extends both the concurrent.futures and dask APIs to moderate sized clusters.
Questions tagged [dask-distributed]
1090 questions
0
votes
1 answer
Exception raised when using client.scatter(df) in Dask.distributed
I'm working with Dask on Kubernetes using the Helm Chart in the stable/dask repository. When using the distributed Client, and calling client.scatter(ddf), I'm getting and an Exception as follows:
Exception: No module named…

GHayes
- 55
- 5
0
votes
1 answer
How to programmatically get the Dask-YARN UI url
I am using Dask YARN to create an application like this:
spec = skein.ApplicationSpec( ... )
cluster = YarnCluster.from_specification(spec)
client = Client(cluster)
ordinarily I'd then run yarn application -list from the command line and get the…

gallamine
- 865
- 2
- 12
- 26
0
votes
1 answer
How can I combine sequential as well as parallel execution of delayed function calls?
I am stuck in a strange place. I have a bunch of delayed function calls that I want to execute in a certain order. While executing in parallel is trivial:
res = client.compute([myfuncs])
res = client.gather(res)
I can't seem to find a way to…

suvayu
- 4,271
- 2
- 29
- 35
0
votes
1 answer
Submit dask arrays to distributed client while using results at the same time
I have dask arrays that represents frames of a video and want to create multiple video files. I'm using the imageio library which allows me to "append" the frames to an ffmpeg subprocess. So I may have something like this:
my_frames = [[arr1f1,…

djhoese
- 3,567
- 1
- 27
- 45
0
votes
1 answer
Dask workers on Kubernetes cannot find csv file
I have setup Dask and JupyterHub on a Kubernetes cluster using Helm with the help of the Dask documentation: http://docs.dask.org/en/latest/setup/kubernetes.html.
Everything deployed fine and I can access the JupyterLab. Then I've create a notebook…

Stanko
- 4,275
- 3
- 23
- 51
0
votes
1 answer
Dask dataframe reshuffeling on many parquet files
I have a dask cluster spread around many worker nodes.
I also have a S3 bucket with as many parquet files (right now 500k files, might three times the size in the future).
The data in the parquet is mostly text:
[username, first_name, last_name,…

t_z
- 96
- 2
- 5
0
votes
1 answer
How to Create a dask dataframe from from a data string seperated by tabs and new line characters
I've my data in form of a string seperated by \ character (for columns) & by new line \n character for rows.
ID\Product\quantity\n1\xx\2
Looks like Dask.array.from_array() support only a array as input.
Although I can convert the above text to…

naresh chava
- 1
- 1
0
votes
1 answer
dask can not read the file that pandas can
I have a csv file that can be accessed using pandas but fails with dask dataframe.
I am using exact same parameters and still getting error with dask.
Pandas use case:
import pandas as pd
mycols = ['id', 'tran_id', 'client_id', 'm_text', 'retry',…

shantanuo
- 31,689
- 78
- 245
- 403
0
votes
1 answer
disable errors while reading csv file
Does dask dataframe pass the error bad lines parameter to pandas DataFrame class?
In other words, this does not seem to work because I get an error when I try to run groupby query.
df = dd.read_csv('s3://todel162xx/some.csv' , error_bad_lines=False,…

shantanuo
- 31,689
- 78
- 245
- 403
0
votes
1 answer
can not load large files using aws-fargate ecs
I tried to follow the instructions mentioned on this page...
https://towardsdatascience.com/serverless-distributed-data-pre-processing-using-dask-amazon-ecs-and-python-part-1-a6108c728cc4
And got 2 errors. One is related to IAM role and the other is…

shantanuo
- 31,689
- 78
- 245
- 403
0
votes
1 answer
Dask.distributed cluster administration
I'm setting up Dask Python cluster at work (30 machines, 8 cores each in average). People use only a portion of their CPU power, so dask-workers will be running on background at low priority. All workers are listening to dask-scheduler on my master…

stkubr
- 371
- 1
- 5
- 15
0
votes
1 answer
Send SIGTERM to the running task, dask distributed
When I submit a small Tensorflow training as a single task, it launches additional threads. When I press Ctrl+C and raise KeyboardInterrupt my task is closed but underlying threads are not cleaned up and training continues.
Initially, I was thinking…

Vladyslav Moisieienkov
- 4,118
- 4
- 25
- 32
0
votes
1 answer
Dask dashboard not starting when starting scheduler with api
I've set up a distributed system using dask. When I start the scheduler using the Python API, the dask scheduler doesn't mention starting the dashboard. As expected, I can not reach it on the address I would expect it to be.
Since bokeh is…

mathivh
- 13
- 7
0
votes
2 answers
scrapy getting stuck after some time
I have a master-worker network on aws ec2 using dask distributed library. For now i have one master machine and one worker machine. Master has REST api (flask) for scheduling scrapy jobs on worker machine. I am using docker for both master and…

suraj deshmukh
- 188
- 1
- 10
0
votes
1 answer
dask distributed , fail to start worker
There are cases where it seems the the dask cluster hang upon restart
to simulate this i have written this stupid code:
import contextlib2
from distributed import Client, LocalCluster
for i in xrange(100):
print i
with…

sami
- 501
- 2
- 6
- 18