Highest Voted 'dask-delayed' Questions

2

votes

0 answers

Large Dask Processes Fail When Creating and Storing DataFrame

I have a number of image files that I'm running a face recognition model on, in order to generate a Dask Dataframe of facial encodings, the file paths for the images that contain each face, and the coordinates in the image of each face. Because I…

asked Dec 11 '20 at 16:19

DataOrc

769
2
8
18

2

votes

1 answer

Do dask delayed functions use the same conda environment?

I've installed dask using conda. When I create delayed functions and run them over my PBS cluster using dask, how do I ensure that the worker nodes activate the same conda environment before running the delayed functions?

python conda dask dask-distributed dask-delayed

asked Nov 10 '20 at 20:37

user2966505

67
4

2

votes

2 answers

Dask: How to use delayed functions with worker resources?

I want to make a Dask Delayed flow which includes CPU and GPU tasks. GPU tasks can only run on GPU workers, and a GPU worker only has one GPU and can only handle one GPU task at a time. Unfortunately, I see no way to specify worker resources in the…

python resources dask dask-distributed dask-delayed

asked Sep 26 '20 at 20:33

braddock

1,345
2
11
13

2

votes

1 answer

Adding a new column to dask dataframe throws ValueError: Length of values does not match length of index

i understand that this traceback ValueError: Length of values does not match length of index arises from the fact that one dataframe is longer or shorter than the other dataframe during ddf.assign(new_col=ts_col or the same operation in…

python dask dask-distributed dask-delayed dask-dataframe

asked Jun 06 '20 at 23:00

gies0r

4,723
4
39
50

2

votes

1 answer

In Dask, is there a way to process dependencies as they become available, as in multiprocessing.imap_unordered?

I have a simple graph structure that takes N independent tasks and then aggregates them. I do not care in what order the results of the independent tasks are aggregated. Is there a way that I can speed up computation by acting on the dependencies as…

python python-multiprocessing dask dask-delayed

asked May 12 '20 at 01:06

Casey Clements

381
3
5

2

votes

1 answer

How to write dask dataframe into single csv in aws s3 using dask delayed so it can be faster?

Currently I am using below code but its taking too much time. As I am converting dask dataframe to buffer and using multipart-upload to upload it in s3 def multi_part_upload_with_s3(file_buffer_obj,BUCKET_NAME,key_path): client =…

dask dask-distributed dask-delayed

asked Oct 22 '19 at 12:28

nitin kadam

21
1

2

votes

1 answer

Is it possible to read parquet metadata from Dask?

I have thousands of parquet files that I need to process. Before processing the files, I'm trying to get various information about the files using the parquet metadata, such as number of rows in each partition, mins, maxs, etc. I tried reading…

dask parquet dask-distributed dask-delayed fastparquet

asked Oct 03 '19 at 13:01

dan

183
13

2

votes

1 answer

How to reduce time taken by to convert dask dataframe to pandas dataframe

I have a function to read large csv files using dask dataframe and then convert to pandas dataframe, which takes quite a lot time. The code is: def t_createdd(Path): dataframe = dd.read_csv(Path, sep = chr(1), encoding = "utf-16") return…

python-3.x pandas dask dask-delayed

asked Sep 19 '19 at 11:33

K.S

113
13

2

votes

1 answer

Dask distributed apparently not releasing memory on task completion

I'm trying to execute a custom dask graph on a distributed system, the thing is that it seems to be not releasing memory of finished tasks. Am I doing something wrong? I've tried changing the number of processes and using a local cluster but it…

pandas dask dask-distributed dask-delayed

asked Jul 08 '19 at 15:31

user2830062

29
2

2

votes

1 answer

MODIS(MYD06_L2) file concatenation using xarray and dask

I try to open multiple MODIS files (MYD06_L2) using xarray (xr.open_mfdataset). I can open a single file or may be some files but i am not able to open many files or one day file as they have different dimensions. d06 = xr.open_mfdataset(M06_2040,…

concatenation dask python-xarray dask-delayed

asked Jun 14 '19 at 02:25

Redwan Walid

21
3

2

votes

0 answers

Generating batches of images in dask

I just started with dask because it offers great parallel processing power. I have around 40000 images on my disk which I am going to use for building a classifier using some DL library, say Keras or TF. I collected this meta-info(image path and…

python python-3.x dask dask-distributed dask-delayed

asked Jun 13 '19 at 18:38

enterML

2,110
4
26
38

2

votes

1 answer

Controlling number of cores/threads in dask

I have a workstation with these specifications: Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian Address sizes: 46 bits physical, 48 bits virtual CPU(s): 16 On-line CPU(s) list:…

python dask dask-distributed dask-delayed

asked Mar 18 '19 at 20:05

muammar

951
2
13
32

2

votes

1 answer

Reading large CSV files using delayed (DASK)

I'm using delayed to read many large CSV files: import pandas as pd def function_1(x1, x2): df_d1 = pd.read_csv(x1) # Some calculations on df_d1 using x2. return df_d1 def function_2(x3): df_d2 = pd.read_csv(x3) …

python pandas csv dask dask-delayed

asked Mar 03 '19 at 14:51

Eghbal

3,892
13
51
112

2

votes

1 answer

Merging a huge list of dataframes using dask delayed

I have a function which returns a dataframe to me. I am trying to use this function in parallel by using dask. I append the delayed objects of the dataframes into a list. However, the run-time of my code is the same with and without dask.delayed. I…

dask dask-delayed

asked Nov 11 '18 at 19:46

NIMA MANAFZADEH DIZBIN

21
3

2

votes

1 answer

Dask dataframe from delayed zip csv

I am trying to create a dask dataframe from a set of zipped CSV files. Reading up on the problem, it seems that dask needs to use dask.distributed delayed() import glob import dask.dataframe as dd import zipfile import pandas as pd from…

pandas dask zip dask-delayed

asked Oct 19 '18 at 03:07

user3237314

21
1
3

Questions tagged [dask-delayed]