I am gathering information about the flights. Maximal length of the flight is 10h. I am getting tracking information in about every 1 minute. Events order is disturb during processing in apache beam. After merging all the data I want to push it to BigQuery and discard the data so it doesn't consume memory.
I have 2 strategies how to do this:
1) Wait 1h and if there is no new data coming push it to BQ
2) In every 15 minutes run my own algorithm which verify if data is complete.
I want to go with 1) cause it's simpler. Can my code be correct?:
models = (xmls | beam.FlatMap(process_xmls))
tracking_informations = models | beam.ParDo(FilterTI())
grouped_tis = tracking_informations | beam.WindowInto(window.FixedWindows(10 * 3600), trigger=AfterProcessingTime(1 * 3600), accumulation_mode=AccumulationMode.DISCARDING) | beam.GroupByKey() | "push and merge to BQ"