0

I have a simple pipeline that reads from a Pub Sub topic and writes to BigQuery. I would like to introduce a 5 minute delay between reading the message from the topic and writing it to BQ.

I thought I could do this using a trigger, similarly to this below, however the message still goes straight through with no delay.

PCollection<PubsubMessage> windowed_inputEvents =
    inputEvents.apply(
        Window.<PubsubMessage>into(FixedWindows.of(Duration.standardMinutes(1)))                  
              .triggering(
                  AfterProcessingTime
                      .pastFirstElementInPane()
                      .plusDelayOf(Duration.standardMinutes(5)))
              .withAllowedLateness(Duration.standardMinutes(1))
              .discardingFiredPanes());

Is it possible to create such a delay using triggers?

Thanks

Kenn Knowles
  • 5,838
  • 18
  • 22
TheCat
  • 31
  • 3

1 Answers1

0

It looks like you are mixing up couple of things. In your example you have a fixed window of 1 minute which means that at the end of the window all the data elements that are part of the window is emitted.

Triggers are basically additional levers that you can leverage to emit data before a window is closed. Triggers cannot hold data post a window period is closed. For example if the window is between 12:00 and 12:01 and if the first element comes at 12:00 then at the time when the window is closed at 12:01 the element is emitted, it is not held back till 12:05.

To meet your requirements you can do couple of things:-

  1. Increase the size of the window period such that is longer than the retention period and you can then emit the data elements with delay.
  2. If this is not possible in BigqueryIO there is a FILE_LOADS method which you can leverage to write data into Bigquery in batches and this API can support a time duration as well using withTriggeringFrequency. More details can be found here - https://beam.apache.org/releases/javadoc/2.2.0/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.Write.html#withTriggeringFrequency-org.joda.time.Duration-
Jayadeep Jayaraman
  • 2,747
  • 3
  • 15
  • 26
  • I would just use a session window for this. Each element could be its own key and the length of the window could be set to 5 minutes. – Matt Welke Aug 01 '21 at 02:57