We have a mobile app backend server using Elastic Beanstalk autoscaling with 4 t2.small instances.
When we send out push notifications, it causes a large short-lived spike in traffic to the servers. Since autoscaling takes ~3 minutes to kick in, it's fairly useless.
How can we reduce the latency during these spikes without burning excessive CPU/$ during the lower traffic times?