Highest Voted 'dask-distributed' Questions

0

votes

1 answer

Design computation graph in dask

Until now, I've used dask with get and a dictionary to define the dependencies graph of my tasks. But it means that I have to define all my graph since the beginning, and now I want to add from time to time new tasks (with dependencies on old…

asked Jun 15 '18 at 15:57

user1769471

29
3

0

votes

1 answer

Can't find dependencies/Dependent not found error

I am trying to run this benchmark on a small dask cluster made of two nodes. The remote worker is simply deployed with the dask-worker command and it appears correctly in the output of client in the benchmark. I've also tried to run some simple…

python dask dask-distributed

asked May 28 '18 at 09:59

Aratz

430
5
16

0

votes

1 answer

Dask Memory Error Grouping DF From Parquet Data

I created a parquet dataset by reading data into a pandas df, using get_dummies() on the data, and writing it to a parquet file: df = pd.read_sql(query, engine) encoded = pd.get_dummies(df,…

python pandas parquet dask dask-distributed

asked Apr 27 '18 at 15:22

OverflowingTheGlass

2,324
1
27
75

0

votes

1 answer

Dask- Same tasks are not running in parallel on cluster of Ubuntu machines

I have 3 ubuntu machine(CPU). my dask scheduler and client both are present on the same machine, whereas the two dask workers are running on other two machines. when I launch first task, it gets scheduled on first worker, but then upon launching…

dask dask-distributed

asked Apr 20 '18 at 05:16

TheCodeCache

820
1
7
27

0

votes

1 answer

Is there any way to know whether a dask-worker is running on CPU device or GPU device.?

Suppose a dask cluster has some CPU devices as well as some GPU devices. Each device runs a singe dask-worker. Now, the question is how do I find that the underlying device of a dask-worker is CPU or GPU. For example:- if the dask-worker is running…

dask dask-distributed

asked Apr 19 '18 at 10:41

TheCodeCache

820
1
7
27

0

votes

1 answer

Simplest way complex dask graph creation

There is a complex system of calculations over some objects. The difficulty is that some calculations are group calculations. This can demonstrate by the following example: from dask distributed import client def load_data_from_db(id): # load…

python dask dask-distributed

asked Jan 22 '18 at 13:00

Vladimir

145
2
9

0

votes

0 answers

dask jobs hangs indefinitely and inconsistently

I am running multiple concurrent dask jobs using dask-client submit api. It have come across this issue multiple times. Thread dump of the specific worker shows below information. Can some one guide me about this problem. ts_data =…

dask dask-distributed

asked Jan 15 '18 at 06:49

Santosh Kumar

761
5
28

0

votes

1 answer

dask distributed.utils - ERROR - state is not a dictionary

I recently upgraded dask-0.15.3 to dask-0.16.0 and distribute-1.19.1 to distribute-1.20.2. After upgrade all dask jobs are failing with exception: _pickle.UnpicklingError: state is not a dictionary Please let me know if I am missing any…

dask dask-distributed

asked Dec 22 '17 at 08:21

Santosh Kumar

761
5
28

0

votes

1 answer

how much time it would take for dask ec2 to setup instances?

I am new to dask.distributed. I am trying to setup a few cluster for distributed job. i am trying dask-ec2 to setup them . When i run the command with required Args ,It stucks at installing worker task. I killed it after 30 minutes.I am using port…

dask dask-distributed

asked Dec 01 '17 at 13:37

Naresh Kumar

3
2

0

votes

1 answer

Dask DataFrame.map_partition() to write to db table

I have a dask dataframe that contains some data after some transformations. I want to write those data back to a mysql table. I have implemented a function that takes a dataframe a db url and writes the dataframe back to database. Because I need…

python mysql dask dask-distributed

asked Nov 30 '17 at 15:39

Apostolos

7,763
17
80
150

0

votes

1 answer

Python + Distributed - Is it possible using Dask to utilize a set of workers to apply a function to seperate files from a folder concurrently

I want to write a program that calculates the time it takes to read in a folder of .py files and calculate the cyclomatic complexity of each of the files. I have Radon installed to calculate the complexity, but I also want to be able to implement a…

python concurrency dask distributed-system dask-distributed

asked Nov 28 '17 at 22:12

J.Doe

21
3

0

votes

1 answer

I have collection of futures which are result of persist on dask dataframe. How to do a delayed operation on them?

I have setup a scheduler and 4 worker nodes to do some processing on csv. size of the csv is just 300 mb. df = dd.read_csv('/Downloads/tmpcrnin5ta',assume_missing=True) df = df.groupby(['col_1','col_2']).agg('mean').reset_index() df =…

dask dask-distributed

asked Nov 23 '17 at 13:38

Naresh Kumar

3
2

0

votes

1 answer

Read dask dataframe from parallel txt files

I have two (or more) parallel text files stored in S3 - i.e. line 1 in first file corresponds to line 1 in second file etc. I want to read these files as columns into a single dask dataframe. What would be the best/easiest/fastest way to do it? PS.…

dask dask-distributed

asked Oct 18 '17 at 16:18

evilkonrex

255
2
10

0

votes

0 answers

Dask dataframe error while reading from HDFS

Here is the code that I am using to connect to hdfs and create dask dataframe. Client(scheduler_host+":"+scheduler_port) df=dd.read_csv("hdfs://hdfs_host/") Error: AttributeError: /usr/lib/libhdfs3.so: undefined symbol:…

hadoop dask dask-distributed

asked Sep 21 '17 at 23:38

Santosh Kumar

761
5
28

0

votes

1 answer

subselection of columns in dask (from pandas) by computed boolean indexer

I'm new do dask (imported as dd) and try to convert some pandas (imported as pd) code. The goal of the following lines, is to slice the data to those columns, which's values fullfill the calculated requirement in dask. There is a given table in…

python slice dask dask-distributed

asked Aug 18 '17 at 09:39

Bastian Ebeling

1,138
11
38

Questions tagged [dask-distributed]