Highest Voted 'dask-distributed' Questions

0

votes

1 answer

How to split a csv into multiple csv files using Dask

How to split a csv file into multiple files using Dask? The bellow code seems to write to one file only which takes a long time to write the full thing. I believe writing to multiple files will be faster. import dask.dataframe as ddf import…

asked Apr 30 '19 at 16:50

mongotop

7,114
14
51
76

0

votes

2 answers

What could be the explaination of this "pyarrow.lib.ArrowIOError: HDFS file does not exist" error when trying to read files in hdfs using Dask?

I'm using Dask Distributed and I'm trying to create a dataframe from a CSV stored in HDFS. I suppose the connection to HDFS is successful as I'm able to print the dataframe columns' names. However, I get the following error when I'm trying to use…

python dask dask-distributed pyarrow

asked Apr 30 '19 at 13:53

Sevy

15
2
6

0

votes

1 answer

HighLevelGraph with (local/multiprocessing) distributed

How should I use dask.highlevelgraph.HighLevelGraph in a local distributed setting. Sequential computation e.g. result = dask.get(some_high_level_graph, [some_targets]) works. import dask from dask.highlevelgraph import HighLevelGraph as CG #…

dask-distributed

asked Apr 29 '19 at 23:13

stustd

303
1
10

0

votes

1 answer

problem parralleling dask code on single machine

Paralleling with dask is slower than sequential coding. I have a nested for loops which I am trying to parallel on a local cluster but can't find the right way. I want to parallel the inside loop. I have 2 big numpy matrices which I am trying to…

python numpy dask dask-distributed

asked Apr 22 '19 at 12:11

netfr

1
4

0

votes

1 answer

Right way to set memory parameters for LocalCluster in dask

I tried the code below, from dask.distributed import Client, LocalCluster worker_kwargs = { 'memory_limit': '2G', 'memory_target_fraction': 0.6, 'memory_spill_fraction': 0.7, 'memory_pause_fraction': 0.8, …

dask dask-distributed

asked Apr 21 '19 at 15:47

zyxue

7,904
5
48
74

0

votes

1 answer

How to specify dask client via environment variable

How can I instruct dask to use a distributed Client as the scheduler, externally from the code, e.g. via an environment variable? The motivation is to take advantage of one of the key features of dask - namely the transparency of going from a single…

python dask dask-distributed

asked Apr 17 '19 at 07:33

stav

1,497
2
15
40

0

votes

1 answer

dask dataframe: merge two dataframes, impute missing value and write to csv only use partial CPUs (20% in each CPU)

I want to merge two dask dataframes, impute missing values with column median and export the merged dataframe to csv files. I got one problem: my current code cannot utilize all the 8 CPUs (~20% of each CPU) I am not sure which part limits the CPU…

dask dask-distributed

asked Apr 09 '19 at 18:48

Jin Wang

1
1

0

votes

1 answer

Reshape, concatenate and aggregate multiple pandas DataFrames

I have five different pandas data frames showing results of calculations done of the same data with same number of samples , all the arrays are identical in shape. (5x10) df shape for each data set: (recording channels) 0 1 2 3 4 5 6 7 8…

pandas dataframe python-3.5 dask dask-distributed

asked Apr 08 '19 at 14:07

abhishake

131
1
12

0

votes

1 answer

How to get results of tasks when they finish and not after all have finished in Dask?

I have a dask dataframe and want to compute some tasks that are independent. Some tasks are faster than others but I'm getting the result of each task after longer tasks have completed. I created a local Client and use client.compute() to send…

dask dask-distributed

asked Apr 05 '19 at 14:13

Diego Rodriguez

5
1

0

votes

1 answer

How to get task result in dask scheduler plugin

I want to forward the result of a task with a scheduler plugin in dask. I have a class that is registered and when I log in the transition function it shows: transition: key=, start=processing, finish=memory, *args=(), **kwargs={'worker':…

dask dask-distributed

asked Apr 03 '19 at 15:40

Matt Nicolls

173
1
7

0

votes

1 answer

How do I ignore a worker whose tasks have failed and redistribute its tasks to other workers?

I was running a function on a pool of N single-threaded workers (on N machines) with client.map and one of the workers failed. I was wondering if there is a way to automatically handle exceptions raised by a worker, to redistribute its failed tasks…

python distributed-computing dask dask-distributed

asked Mar 27 '19 at 19:12

billiam

132
1
15

0

votes

1 answer

Can I retrieve a distributed.client instance if I know its id?

With dask there is an id associated with each instance of distributed.client. Calling .id on a client will show its id. Can I retrieve a client instance if I know its id?

python distributed dask dask-distributed

asked Mar 22 '19 at 21:57

billiam

132
1
15

0

votes

1 answer

Dask on single OSX machine - is it parallel by default?

I have installed Dask on OSX Mojave. Does it execute computations in parallel by default? Or do I need to change some settings? I am using the DataFrame API. Does that make a difference to the answer? I installed it with pip. Does that make a…

python dask dask-distributed

asked Mar 13 '19 at 01:30

power

1,680
3
18
30

0

votes

1 answer

How to parallelize a nested loop with dask.distributed?

I am trying to parallelize a nested loop using dask distribute that looks this way: @dask.delayed def delayed_a(e): a = do_something_with(e) return something @dask.delayed def delayed_b(element): computations = [] for e in element: …

python-3.x parallel-processing dask dask-distributed dask-delayed

asked Mar 10 '19 at 20:12

muammar

951
2
13
32

0

votes

2 answers

Process pool on DASK

I am new to DASK. I can submit 10 tasks using the client.map(funct_name, iterator) where the iterator is a list which contain the 10 elements. Now, I want to submit the next task let's say 11th task when anyone from earlier submitted 10 tasks is…

python-2.7 dask dask-distributed

asked Mar 01 '19 at 13:56

Mahendra Gaur

380
2
11

Questions tagged [dask-distributed]