I am new to Kubernetes and Flink for some batch processing. I'd like to setup a Flink Job on EKS and I have about 2.5 TB of data that needs some aggregations performed every 30 minutes. (Overall, intend to process 120 TB of data per day from several IoT devices). This data can be partitioned by different customers (~5000 customers).
How can I submit a batch job request to flink cluster per customer, where the source is an S3 bucket that is already partitioned by customers and the sink is also an s3 bucket that has the aggregated customer data?
Can I use the RestClusterClient
for this purpose? Or can I build a Flink Client as a separate POD in the Flink Cluster that can submit the jobs based out of some trigger (EventBridge/SQS?)