0

I wanted to clear my understanding on the following. Use case

Basically I am running a flink batch job. My requirement is following I have 10 tables having raw data in postgresql I want to aggregate that data by creating a tumble window of 10 minutes I need to store the aggregated data into aggregated postgresql tables

My pseudo code somewhat looks like this

initialize StreamExecutionEnvironment, StreamTableEnvironment
load all the configs from file
configs.foreach(
load data from table
aggregate
store data
delete temporary views created
)
streamExecutionEnvironment.execute()

Everything works fine for now. Still I have gotten one question. I think with this approach all the load functions would be executed simultaneously. So it would put load on flink right as all data is getting loaded simultaneously?? Or my understanding is wrong and the data would get loaded, processed and stored one by one?? please guide

  • Flink will process the data sources (10 tables) as streams - one row at a time. However it will distribute the processing to its whole cluster. If the data loaded from the tables is larger than cluster capacity and the aggregation logic joins every dataset, it might at some point overload the Flink cluster. You can monitor for backpressure in Flink console. More details here - https://nightlies.apache.org/flink/flink-docs-master/docs/ops/monitoring/back_pressure/ – Shankar Dec 29 '22 at 20:47

0 Answers0