3

This is now a Github issue

What does the parameter compute in Dask dataframe's set index do?

df.set_index(col, compute=True)

The documentation says

compute: bool, default False

  • Whether or not to trigger an immediate computation. Defaults to False. Note, that even if you set compute=False, an immediate computation will still be triggered if divisions is None.

This would suggest that if I provide divisions and set compute=True, immediate computation will be triggered. This does not seem to be true, however.

import dask.datasets
df = dask.datasets.timeseries()

# Nothing gets submitted to the scheduler
df.set_index(
    'name', 
    divisions=('Alice', 'Michael', 'Zelda'), 
    compute=True
) 

Going down the stack of functions set_index actually calls, it appears that the only place where compute is actually used in rearrange_by_column_disk. And indeed:

# Still, nothing gets submitted
df.set_index(
    'name', 
    divisions=('Alice', 'Michael', 'Zelda'), 
    shuffle='tasks',
    compute=True
) 

# Something is computed here
df.set_index(
    'name', 
    divisions=('Alice', 'Michael', 'Zelda'), 
    shuffle='disk',
    compute=True
) 

So what happens, exactly?

I suspect that the actual resulting partitions might be computed and saved to disk. If that's the case, then how could I tell this has happened?

Dahn
  • 1,397
  • 1
  • 10
  • 29
  • are you assigning the result to anything? [set_index](https://docs.dask.org/en/latest/generated/dask.dataframe.DataFrame.set_index.html) is not an in-place operation. – Michael Delgado Nov 18 '21 at 00:53
  • dask doesn't automatically save anything to disk. what exactly do you mean when you say "nothing happens"? are you watching the dask dashboard or just noting that the index hasn't changed? you need to do `df = df.set_index()`. – Michael Delgado Nov 18 '21 at 01:04
  • Right, I could've been clearer. I mean that no computation takes place in all but the last case. I am basing that on the dashboard and the time taken to actually perform the operation. I will edit the question to be more explicit tomorrow. – Dahn Nov 18 '21 at 01:47

0 Answers0