I'm trying to create a robust autoscaling process for my ECS cluster but am facing problems with resolution of CpuUtilization metric. I have turned on 'Detailed metrics' for 1-min resolution, but am not able to achieve good scaling results. I am deploying an ML model which takes roughly 1.5s to infer. I am not facing any memory bottleneck and hence, am using CpuUtilization for scaling.
I need fast scaling as when requests start piling up the response time easily shoots up to 3-5s. Currently, with 'Detailed Metrics' enabled. The scale-out time takes around 3-5 miuntes to start as 3 datapoints are checked for 1-min res metrics. If I have 5-10s res metric, then I can look at 6 data points within 30s and start the scale-out job faster.
I tried using Lambda, StepFunctions and EventBridge from this blog. But, I am not able to get CpuUtilization or MemoryUtilization, only the task, service and container counts.
Is there a way to get Cpu and Memory metrics directly from ECS? I know we can use cloudwatch.get_metric_statistics(). But, we can only get datapoints that are reported to CloudWatch. So, not useful.