I am working on a simple aggregation that sums totals of events happening on a given resource (see: Calculate totals and emit periodically in flink). With some help I got this to work, but am now hitting another issue.
I am trying to calculate totals for lifetime of a resource, but I am reading events from kinesis stream that has a retention period of 24 hours. As this means that I don't have access to events which happened before that, I need to bootstrap my state from a legacy (batch) system that calculates totals once a day.
Essentially I'd like to somehow bootstrap the state from legacy system (loading stats for yesterday) and then join todays data from kinesis stream on top of that and avoid duplication in the process. This would ideally be a one-off process and application should run from kinesis from then onwards.
I'm happy to provide more details if I missed something.
Thanks