I was wondering if somebody could help me understand the way Bag
objects handle partitions. Put simply, I am trying to group items currently in a Bag
so that each group is in its own partition. What's confusing me is that the Bag.groupby()
method asks for a number of partitions. Shouldn't this be implied by the grouping function? E.g., two partitions if the grouping function returns a boolean?
>>> a = dask.bag.from_sequence(range(20), npartitions = 1)
>>> a.npartitions
1
>>> b = a.groupby(lambda x: x % 2 == 0)
>>> b.npartitions
1
I'm obviously missing something here. Is there a way to group Bag
items into separate partitions?