How poll lots of external services from aws in a scalable and self-healing way

Question

I have a set of external services which I want to poll continuously in short intervals (about 30seconds) from AWS. For example, I have a set of git repos which I want to poll for changes to trigger ci pipelines.

My requirements are:

I want the solution to be scaleable (let's assume I want to poll thousands of git repos).
I want the solution to be self healing. So it is OK, if external services are not polled for a short time (because of some failure), but after a short time polling should start again.
I need some external way of removing and adding external services that should be polled.

What would be a cost-efficient way of implementing this in AWS?

My thoughts on a solution

The obvious approach is to start a set of ec2 instances (maybe 1 for every 100 services I want to poll) and distribute the services among them. For self-healing one approach would be an autoscaling group.

But for this to work, every instance in the autoscaling would know for which services it is responsible, which means every instance would need a unique id it can recover upon being restarted. From here I read that is not a good practice.

Tim · Answer 1 · 2019-08-07T20:11:32.453

What is your actual use case? You gave an example of monitoring git repos, but the solution could be different if this isn't what you're actually doing.

Serverless

This sounds like a textbook use case for using Lambda/ Serverless compute. Is that an option for you?

Push

Push from whatever receives the change to you is much more efficient than polling (as pointed out by MLu ;) ). If your workload really is monitoring git repositories (you gave it as an example) they can push to you when something changes. I don't know if you can do this if you don't own the repo, but it's definitely worth considering.

Push to an SQS queue, via some other mechanism such as lambda or an EC2 instance, would be cheap and reliable.

EC2 Options

If you must use EC2 instances I can think of a couple of options

EC2 Option One

A publisher that pushes required checks onto a queue. Small EC2 / Lambda triggered by CloudWatch Events
An autoscaling group of consumers that scales based on the number of messages in the queue

A downside to this is cost. 5000 checks every 30 seconds is 14.4M messages a day, which is $5.36 in SQS fees. If you batch them to groups of 10 that's 50c per day. Seems like running your own queue on a t3a.nano instance might be cheaper, but then you have to set it up and manage it.

EC2 Option Two

Similar to above, but store the jobs required in DynamoDB, and have the servers poll for jobs. It'd be more fiddly to set up, basically using a DB like a queue.

EC2 Option Three

You could have the checks to do stored in S3, with lambda watching the object / file for modifications. When it does the lambda divides up the jobs and updates configurations on your workers.

There are many other solutions, probably better than these, but that's a few ideas without too much thought.

score 0 · Answer 2 · answered Aug 07 '19 at 19:55

Lightweight checker / publisher sending to a queue and a bunch of consumers / workers as @Tim suggests is the correct pattern for this.

However if your intent is really to trigger CI/CD from GIT consider configuring post-receive hook in the repositories - it will be much less resource consuming.

Also most GIT repo services like GitHub, GitLab or Bitbucket publish a feed of recent changes, RSS or similar. Rather then checking thousands of git repos with git something ... every 30 secs do a single http request to fetch the latest changes from the feed.

Hope that helps :)

How poll lots of external services from aws in a scalable and self-healing way

My thoughts on a solution

2 Answers2