1

I'm trying to create a robust autoscaling process for my ECS cluster but am facing problems with resolution of CpuUtilization metric. I have turned on 'Detailed metrics' for 1-min resolution, but am not able to achieve good scaling results. I am deploying an ML model which takes roughly 1.5s to infer. I am not facing any memory bottleneck and hence, am using CpuUtilization for scaling.

I need fast scaling as when requests start piling up the response time easily shoots up to 3-5s. Currently, with 'Detailed Metrics' enabled. The scale-out time takes around 3-5 miuntes to start as 3 datapoints are checked for 1-min res metrics. If I have 5-10s res metric, then I can look at 6 data points within 30s and start the scale-out job faster.

I tried using Lambda, StepFunctions and EventBridge from this blog. But, I am not able to get CpuUtilization or MemoryUtilization, only the task, service and container counts.

Is there a way to get Cpu and Memory metrics directly from ECS? I know we can use cloudwatch.get_metric_statistics(). But, we can only get datapoints that are reported to CloudWatch. So, not useful.

1 Answers1

0

You can't change that. 1 min value is set by AWS. The only thing you can do to get better resolution is to create your own custom metrics. Custom metrics can have resolution of 1 second.

Marcin
  • 215,873
  • 14
  • 235
  • 294
  • I think I was not clear enough. I already tried using custom metric which I created using StepFunctions and EventBridge. The metric will ping the ECS cluster for task, service and instance count every 5 second. I followed the above given blog for that. Unfortunately, I am not able to get the resource statistics like Cpu or Memory Utilizatio via this method. Any ideas on how can I get those? – Priyam Mehta Feb 16 '22 at 08:55
  • @PriyamMehta You have to create your own custom metrics for Cpu or Memory Utilization. Can't use AWS provided ones, as they have 1 minute resolution only. – Marcin Feb 16 '22 at 09:37
  • I know but I am struggling to create my own custom metrics for Resource stats. And I am asking for a suggestion on how to proceed. Using cloudwatch.get_metric_statistics(), we can get resource stats, but only those that are reported by the ECS cluster to cloudwatch (1-min). I am asking, is there a way for me to directly ping the ECS cluster to get Resource Stats every second which I can use to create a custom metric? Hope this clears the confusion! – Priyam Mehta Feb 16 '22 at 21:05
  • @PriyamMehta If you use EC2-based cluster, you have to setup custom metrics on each instance in the cluster. If its Farget cluster, each of your containers must monitor its own cpu and ram usage and produce the custom metrics. – Marcin Feb 16 '22 at 22:03