0

I am currently using AWS Elastic Container Service. It is running with a HAProxy server acting as our load balancer, along with a lambda that acts as a service discovery mechanism for the servers. Our servers have a custom scaling metric based on queue times, when this metric is exceeded, it adds another server to our HAProxy load balancer and deploys a new container to that server. The server scaling works fine, where my issues lie is the container scaling.

During a scale-up the container is the first to attempt to spin up, and when it does it fails because the new server is not available. Once the server is actually available, the container might not attempt to scale up for quite a long period of time.

Key question: Is there any way to increase the frequency of which it attempts to scale up the container, or is there something special that has to be done to force the container to continue trying to scale up after a failure when using a custom load balancer?

This behaviour is also something I notice when a service has a desired count greater than the current count. It will not create new containers, unless some activity happens that causes it to update.

I know when using an application load balancer this worked without issue.

Tim
  • 31,888
  • 7
  • 52
  • 78
Ramzi C.
  • 101
  • 3
  • A small clarification: your text "scales the particular servers in question up one" - "scale up" means make the server larger, eg moving from m4.large to m4.2xlarge, which takes some time. Do you mean "scale out adding one server"? Your question isn't as clear as it could be - do your servers have a custom scaling metric, or is this a property of an auto scaling group / ECS / etc? The container probably doesn't attempt to scale up, ECS may attempt to create a new container on a new server. Suggest you try to make your question a bit more precise. – Tim Jun 05 '18 at 00:33
  • The initial blurb is just describing the situation. The actual server scaling portion is fine, my issues lie with the container scaling. I will edit it to reflect that better. – Ramzi C. Jun 05 '18 at 00:47
  • Why are you using HAProxy rather than ALB if ALB works properly? Cost savings, flexibility, custom requirements - there are valid reasons, understanding them helps answer your question. Are you using AWS auto scaling or do you have custom code creating instances as required? – Tim Jun 05 '18 at 01:12
  • The ALB didn't work because we have some requirements for stickiness that don't play nicely with it. HAProxy as a load balancer doesn't actually cause any problems. Only container scaling does. The scaling is very simple, once the custom metric passes a threshold, an alarm triggers. This alarm causes server instances to increase by one through the autoscaling group and the desired count on the ecs service to increase by one. The server scales fine, but the container fails to scale (since the server is warming up) and never gets around to scaling. – Ramzi C. Jun 05 '18 at 01:25
  • You need to find a way to start the server a few minutes before the container is started. You can do almost anything in Lambda, which might be the best option for you. You could have one lambda function start the server, pause, then start the container. Alternately you could have the autoscaling group working, and a lambda function start every 5 minutes that checks for empty servers tagged in some specific way, or some other criteria like SQS / SNS. You may have to be a bit creative here since you've gone away from the AWS way of doing things with HAProxy. Might be easier to get ALB working. – Tim Jun 05 '18 at 07:51
  • Calling AWS ECS CLI update-service with --force-new-deployment seems to force the container service to meet its desired count if possible for it. My service discovery lambda, which runs every minute, already checks the state of the network, so Ill just have it call this when new servers are present. Hopefully it will work. – Ramzi C. Jun 05 '18 at 17:23
  • Is this still an issue using Fargate? Or was it only with EC2 instances? – wiggles Dec 05 '20 at 17:46

0 Answers0