Handling short-lived bursty traffic on AWS

Question

We have a mobile app backend server using Elastic Beanstalk autoscaling with 4 t2.small instances.

When we send out push notifications, it causes a large short-lived spike in traffic to the servers. Since autoscaling takes ~3 minutes to kick in, it's fairly useless.

How can we reduce the latency during these spikes without burning excessive CPU/$ during the lower traffic times?

What about... sending out your push notifications over a longer period of time? — Michael - sqlbot, Sep 07 '17 at 23:27
Without knowing what your application is doing and a lot of detail about it, it's impossible to say what you might need to do. @Tim's answer seems promising, but also... API Gateway and Lambda are promising, and CloudFront caching and deduplication of identical requests during traffic spikes is promising... but it's hard to say which of those might be applicable. — Michael - sqlbot, Sep 08 '17 at 20:02

score 2 · Answer 1 · answered Sep 07 '17 at 22:29

I don't think you can rely on auto scaling. AWS has a page on manual scaling which you should read.

You could make use of schedule scaling, set up to scale before your notifications go out.

You could simply start more servers manually, add them to the load balancer, and stop them manually when they're no longer required. This can be done with the console or using a script that calls the API.

You could change the minimum group size using the console or API before you send a notification.

Handling short-lived bursty traffic on AWS

1 Answers1