Questions tagged [dask-delayed]

Dask.Delayed refers to the python interface that consists of the delayed function, which wraps a function or object to create Delayed proxies. Use this tag for questions related to the python interface.

Dask.Delayed refers to the python interface that consists of the delayed function, which wraps a function or object to create Delayed proxies. Use this tag for questions related to the python interface.

290 questions
0
votes
1 answer

Dask - load dataframe from SQL without specifying index_col

I'm trying to load a Dask dataframe from a SQL connection. Per the read_sql_table documentation, it is necessary to pass in an index_col. What should I do if there's a possibility that there are no good columns to act as index? Could this be a…
jrdzha
  • 161
  • 2
  • 12
0
votes
1 answer

Dask with tls connection can not end the program with to_parquet method

I am using dask to process 10 files which the size of each file is about 142MB. I build a method with delayed tag, following is an example: @dask.delayed def process_one_file(input_file_path, save_path): res = [] for line in…
DuFei
  • 447
  • 6
  • 20
0
votes
1 answer

Understanding Dask-ML

I have read through Dask-ML documentation and I have googled around, but have 2 questions I would like some clarity on if anyone could assists: by saying "using a cluster of computers", does it mean processing is distributed across other people's…
Leockl
  • 1,906
  • 5
  • 18
  • 51
0
votes
1 answer

How to parallelize a loop with Dask?

I find the Dask documentation quite confusing. Let's say I have a function: import random import dask def my_function(arg1, arg2, arg3): val = random.uniform(arg1, arg2) va2 = random.uniform(arg2, arg3) return val1 + val2 some_list =…
Qubix
  • 4,161
  • 7
  • 36
  • 73
0
votes
1 answer

Setting up mini Dask cluster

To better understand Dask I decided to set up a small Dask cluster: two servers 32GB RAM and a Mac. All are part of a local LAN and all run identical version of Python 3.5 + Dask installed under virtual environment. I installed sshfs on both…
Kira
  • 387
  • 1
  • 3
  • 8
0
votes
2 answers

Dask delayed function call with non-passed parameters

I am seeking to better understand the following behavior when using dask.delayed to call a function that depends on parameters. The issue seems to arise when parameters are specified in a parameters file read by configparser. Here is a complete…
elltrain
  • 82
  • 4
0
votes
1 answer

Dealing with interdependent files in graph-parallel computation

I’m trying to parallelize the following code (MCVE) by creating a task graph using dask.delayed (or by implementing a computational graph myself): os.chdir('./kitchen1') write_dough() # writes file ./dough write_topping() # writes file…
stalostan
  • 5
  • 1
  • 4
0
votes
1 answer

Dask dataframe groupby fails with type error, but identical pandas groupby succeeds

I have created a dask dataframe from geopandas futures that each yield a pandas dataframe following the example here: https://gist.github.com/mrocklin/e7b7b3a65f2835cda813096332ec73ca daskdf = dd.from_delayed(lazy_dataframes,lazy_dataframes,…
bw4sz
  • 2,237
  • 2
  • 29
  • 53
0
votes
1 answer

Dask Locality, how to read from a local worker file?

I'm trying to read from each worker a unique local file, however I get the same result across all the workers, instead of a unique result from each worker....Can someone please point what I'm doing wrong ? from dask.distributed import Client,…
Rsokolov
  • 53
  • 1
  • 6
0
votes
1 answer

How does Dask execute code on multiple vm's in the cloud

I wrote a program with dask and delayed and now I want to run it on several machines in the cloud. But there's one thing I don't understand - how does dask run the code on multiple machines in the cloud without having all the dependencies of the…
0
votes
1 answer

When Dask tasks run multiple times, which result is used?

First, read this question: Repeated task execution using the distributed Dask scheduler Now, when Dask decides to rerun a task due to worker stealing or a task failing (as a result of memory limits per process for example), which task result gets…
medley56
  • 1,181
  • 1
  • 14
  • 29
0
votes
0 answers

SGECluster in multiple queues

I'm using dask.distributed to launch jobs on a SGE cluster (https://jobqueue.dask.org/en/latest/generated/dask_jobqueue.SGECluster.html#dask_jobqueue.SGECluster) via dask.bags and/or dask.delayed. Everything works nicelly. However, i may have some…
0
votes
1 answer

Dask lowest than vanilla python? What is what I'm doing wrong?

I'm testing dask and i can't understand how dask is slower that plain python. I was developed in jupyer two examples to get the time for each, and i think that i am doing something wrong The first with dask: 28.5 seconds and after in plain python…
0
votes
1 answer

Dask Vs Multiprocessing when using C pointers

When I use C pointers in python and try to process it using dask, it working like a pro. But when I try to use python's multiprocessing module, it splits the pointer reference error. How is dask able to overcome the multiprocessing module when using…
Naren Babu R
  • 453
  • 2
  • 9
  • 33
0
votes
2 answers

Splitting very large csv files into smaller files

Is Dask proper to read large csv files in parallel and split them into multiple smaller files?