After using TextIO.read to get a PCollection<String>
of the individual lines, is it possible to then use some kind of combine transform to into batches (groups of 25 for example)? So the return type would end up looking something like: PCollection<String, List<String>>
. It looks like it should be possible using some kind of CombineFn
, but the API is a little arcane to me still.
The context here is I'm reading CSV files (potentially very very large), parsing + processing the lines and turning them into JSON, and then calling a REST API... but I don't want to hit the REST API for each line individually because the REST API supports multiple items at a time (up to 1000, so not the whole batch).