I am implementing an UnboundedReader in order to use a custom data source (based on a company-internal, subscription based Java API). When I execute a pipeline I notice that multiple instances of UnboundedReader are created. How does BEAM decide how many times to call the
public abstract UnboundedSource.UnboundedReader<OutputT> createReader(PipelineOptions options, CheckPointMarkT checkpointMark)
method of UnboundedSource?
My split() method is implemented as:
public List<? extends UnboundedSource<MyRecord, MyCheckpointMark>> split(int desiredNumSplits, PipelineOptions options) throws Exception {
List<MySubscriptionSource> list = new ArrayList<>(1);
list.add(this);
return list;
}
Is there a way to force only a single reader to be created?