Questions tagged [dask-delayed]

Dask.Delayed refers to the python interface that consists of the delayed function, which wraps a function or object to create Delayed proxies. Use this tag for questions related to the python interface.

Dask.Delayed refers to the python interface that consists of the delayed function, which wraps a function or object to create Delayed proxies. Use this tag for questions related to the python interface.

290 questions
0
votes
0 answers

I want to lemmatize dask dataframe but I am stuck

I am new to dask and was wondering if anyone could give me a hand. I have a large text dataset >20GB and need/want to lemmatize a column. My current function - which works with pandas directly is wnl = WordNetLemmatizer() def lemmatizing(sentence):…
osterburg
  • 447
  • 5
  • 24
0
votes
1 answer

How can I combine sequential as well as parallel execution of delayed function calls?

I am stuck in a strange place. I have a bunch of delayed function calls that I want to execute in a certain order. While executing in parallel is trivial: res = client.compute([myfuncs]) res = client.gather(res) I can't seem to find a way to…
suvayu
  • 4,271
  • 2
  • 29
  • 35
0
votes
1 answer

Dask Delayed caching

What is the current state of the art with regards to integrating caching into dask-delayed graphs? I have large graphs that have paths that would benefit significantly from persistent caching (i.e. disk, and hashed by params) for each separate run…
headsling
  • 623
  • 3
  • 13
0
votes
1 answer

How to Create a dask dataframe from from a data string seperated by tabs and new line characters

I've my data in form of a string seperated by \ character (for columns) & by new line \n character for rows. ID\Product\quantity\n1\xx\2 Looks like Dask.array.from_array() support only a array as input. Although I can convert the above text to…
0
votes
1 answer

Merging datasets using dask proves unsuccessful

I am trying to merge a number of large data sets using Dask in Python to avoid loading issues. I want to save as .csv the merged file. The task proves harder than imagined: I put together a toy example with just two data sets The code I then use is…
MCS
  • 1,071
  • 9
  • 23
0
votes
1 answer

Passing an iterator to dask.delayed function

I'm trying to pass an iterator over a (non-standard) file-like object to a dask.delayed function. When I try to compute(), I get the following message from dask, and the traceback below. distributed.protocol.pickle - INFO - Failed to serialize …
suvayu
  • 4,271
  • 2
  • 29
  • 35
0
votes
1 answer

Running a function on a slice of a dask array

I have been trying to figure out how to execute functions on slices of a dask array. For example if I create the following dask array: import numpy as np import dask.array as da x = da.random.normal(10, 0.1, size=(200, 4),chunks=(100, 100)) and…
Eric Eckert
  • 117
  • 6
0
votes
1 answer

Error: No module name 'Custom Class' while passing a Client object in the custom class's constructor in dask

I have been trying to write custom classes for Preprocessing followed by Feature selection and Machine Learning algorithms as well. I cracked this (preprocessing only) using @delayed. But when I read from the tutorials that the same can be achieved…
Asif Ali
  • 1,422
  • 2
  • 12
  • 28
0
votes
1 answer

Cant train Keras Model with Dask?

I expected from the simple examples using Dask delayed I have read that I could essentially replicate gridsearchcv from scikit-learn with a couple of function calls as follows. It appears that the model is never fit (model.fit(...)) because the rest…
B_Miner
  • 1,840
  • 4
  • 31
  • 66
0
votes
0 answers

Custom search in Dask

I have 1000 regex patterns which I have to search in each of the 9000 strings. Normal brute force method using pandas list took 25 min for the same task. I have used delayed function of dask to parallelize the entire function. It took 9 min to…
ANKIT JHA
  • 359
  • 1
  • 3
  • 9
0
votes
0 answers

Unable to Replace a Dask Series Partition

I'm trying to replace a Series dask partition with my own partition. I've used the code snippet given by @MRocklin in this post. list_of_delayed = dask_df.to_delayed() new_partition = dask.delayed(pd.read_csv)(filename) list_of_delayed[i] =…
Dhruv Kumar
  • 399
  • 2
  • 13
0
votes
1 answer

Design computation graph in dask

Until now, I've used dask with get and a dictionary to define the dependencies graph of my tasks. But it means that I have to define all my graph since the beginning, and now I want to add from time to time new tasks (with dependencies on old…
0
votes
1 answer

Dask Delayed ignores name for dependent variables

When creating a graph of calculations using delayed I'm trying to assign names so that if I visualize the graph it's readable. However, for delayed variables that are dependent on functions the name parameter doesn't seem to affect the key. Here's a…
Michael S.
  • 327
  • 1
  • 2
  • 11
0
votes
2 answers

nested dask.compute not blocking

dask.compute(...) is expected to be a blocking call. However when I have nested dask.compute, and the inner one does I/O (like dask.dataframe.read_parquet), the inner dask.compute is not blocking. Here's a pseudo code example: import dask,…
user1527390
  • 123
  • 1
  • 3
  • 7
0
votes
1 answer

Using Dask compute causes execution to hang

This is a follow up question to a potential answer to one of my previous questions on using Dask computed to access one element in a large array . Why does using Dask compute cause the execution to hang below? Here's the working code…
sudouser2010
  • 171
  • 1
  • 6
1 2 3
19
20