0

I have a function crop_images_circle(file_dir,kmeans_dir,folders_dir,filename). that does not return any thing. Trying to use dask to parallalise the computation.

Implementation without dask for some 100 odd files:

for filename in os.listdir(file_dir):
    crop_images_circle(file_dir,kmeans_dir,folders_dir,filename)

Execution time in seconds: 53.58223843574524

Implementation with dask :

# Version 1
for filename in os.listdir(file_dir):
    x = delayed(crop_images_circle)(file_dir,kmeans_dir,folders_dir,filename)
    x.compute()

Execution time in seconds: 46.36917209625244

# Version 2
for filename in os.listdir(file_dir):
    x = delayed(crop_images_circle)(file_dir,kmeans_dir,folders_dir,filename)

x.compute()

Version 2 of dask just process one Image. Not sure why. Documentation does mention similar use, TIme required is not that significantly different. Is there a improvement using a dask ? Or I am messing up the syntax?

Sushant
  • 160
  • 2
  • 10

1 Answers1

0
This is from dask best practices. might answer the quesion above

Don’t rely on side effects

Delayed functions only do something if they are computed. You will always need to pass the output to something that eventually calls compute.

enter image description here

Sushant
  • 160
  • 2
  • 10