I'm looking into grouping elements during the flow into batch groups that are grouped based on a batch size.
In Pseudo code:
PCollection[String].apply(Grouped.size(10))
Basically converting a PCollection[String] into PCollection[List[String]] where each list now contains 10 elements. As it is batch and in case it doesn't evenly divide the last batch would contain the left over elements.
I have two ugly ideas with windows and fake timestamps or a GroupBy using keys based on a random index to distribute evenly, but this seems like a to complex solution for the simple problem.