3

If I have an already indexed Dask dataframe with

>>> A.divisions
(None, None)
>>> A.npartitions
1

and I want to set the divisions, so far I'm doing

A.reset_index().set_index("index", divisions=sorted(divisions))

because A.repartition(divisions=sorted(divisions)) complains "left side of old and new divisions are different". Is there a better way?

astrojuanlu
  • 6,744
  • 8
  • 45
  • 105

1 Answers1

2

As of dask.__version__ == '0.16.0' if you happen to know the divisions of an existing dataframe you can assign them directly.

A.divisions = tuple(divisions)
MRocklin
  • 55,641
  • 23
  • 163
  • 235