I have an Apache Beam pipeline that loads a large import file of around 90GB. I've written the pipeline in the Apache Beam Java SDK.
Using the default settings for PipelineOptionsFactory
, my job takes quite a while to complete.
How can I control, and programatically specify the parallelism for my job, and thus the number of workers?