5

For debugging purposes, I'd like to be able to turn an unbounded PCollection into a bounded PCollection. Is there a straightforward way? It seems to me that doing this to force a pipeline to complete, among other things, would be useful.

I thought Sample.any() (Javadoc here: https://beam.apache.org/documentation/sdks/javadoc/2.5.0/org/apache/beam/sdk/transforms/Sample.html#any-long- ), would be what I needed, but that PTransform doesn't seem to change the boundedness of the pipeline.

EDIT: I tried the suggestion from the-hbar-tender, but wasn't able to make it work. Here's how I tried that:

BoundedReadFromUnboundedSource brfus = unbounded.withMaxNumRecords(10);

... where unbounded would have been created like this:

Read.Unbounded unbounded = new Read.Unbounded("some name", pubsubUnboundedSource);

... where pubsubUnboundedSource would have been created like this:

PubsubUnboundedSource pubsubUnboundedSource = PubsubUnboundedSource(pubsubClientFactory, projectValueProvider, topicValueProvider, subscriptionValueProvider, "some timestamp attribute", "some id attribute", true)

... but pubsubClientFactory can't be instantiated, because PubsubClient is not public. I gave up there. Maybe there is another way to get at this.

  • https://stackoverflow.com/a/34956344/10054105 `Read.from(unboundedSource).withMaxNumRecords(N)` you should be able to play around that. Another option would be to window it into a bounded window. Although that doesn't answer the more general issue of "ending" a streaming job – The hBar Tender Jul 30 '18 at 13:32
  • Thanks. I edited my post, showing how I tried that. (I think your comment should be a question.) – Ethan Herdrick Sep 04 '18 at 21:19

0 Answers0