I have a set of external services which I want to poll continuously in short intervals (about 30seconds) from AWS. For example, I have a set of git repos which I want to poll for changes to trigger ci pipelines.
My requirements are:
- I want the solution to be scaleable (let's assume I want to poll thousands of git repos).
- I want the solution to be self healing. So it is OK, if external services are not polled for a short time (because of some failure), but after a short time polling should start again.
- I need some external way of removing and adding external services that should be polled.
What would be a cost-efficient way of implementing this in AWS?
My thoughts on a solution
The obvious approach is to start a set of ec2 instances (maybe 1 for every 100 services I want to poll) and distribute the services among them. For self-healing one approach would be an autoscaling group.
But for this to work, every instance in the autoscaling would know for which services it is responsible, which means every instance would need a unique id it can recover upon being restarted. From here I read that is not a good practice.