4

We are running a video conferencing server in an EC2 instance.

Since this is a data out (egress) heavy app, we want to monitor the network data out closely (since we are charged heavily for that).

enter image description here

As seen in the screenshot above, in our test, using nmon (top right) or nload (left) in our EC2 server shows the network out as 138 Mbits/s in nload and 17263 KB/s in nmon which are very close (138/8 = 17.25).

But, when we check the network out (bytes) in AWS Cloudwatch (bottom right), the number shown is very high (~ 1 GB/s) (which makes more sense for the test we are running), and this is the number for which we are finally charged.

Why is there such a big difference between nmon/nload and AWS Cloudwatch? Are we missing some understanding here? Are we not looking at the AWS Cloudwatch metrics correctly?

Thank you for your help!

Edit:

Adding the screenshot of a longer test which shows the average network out metric in AWS Cloudwatch to be flat around 1 GB for the test duration while nmon shows average network out of 15816 KB/s.

enter image description here

Raman Kishore
  • 542
  • 2
  • 12
  • The CW screenshot you showed is the average network out bytes in 5 minutes. – jellycsc Jun 21 '21 at 15:34
  • @jellycsc - But, in the screenshot it says -> Statistic: Average & Period: 1 second. – Raman Kishore Jun 21 '21 at 15:38
  • I know. Although you selected `1 second`, the data is not captured or pushed to CW at that frequency. You can kinda see in the CW graph that the data points are roughly 5 minutes apart. – jellycsc Jun 21 '21 at 15:41
  • Ok. Even if it's a 5 min average, I'm not sure how to make sense out of it when compared with nmon data. In our tests, we see the CW graph to be flat at 1 GB for the test duration (when we select the metrics - Statistic: Average & Period: 1 second) . But, nmon still shows average network out of around 15-18 MB/s. – Raman Kishore Jun 21 '21 at 15:44
  • @jellycsc - Added a screenshot showing the data for a longer test duration. – Raman Kishore Jun 21 '21 at 16:07

2 Answers2

1

Just figured out the answer to this.

The following link talks about the periods of data capture in AWS: https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/cloudwatch_concepts.html

Periods

A period is the length of time associated with a specific Amazon CloudWatch statistic. Each statistic represents an aggregation of the metrics data collected for a specified period of time. Periods are defined in numbers of seconds, and valid values for period are 1, 5, 10, 30, or any multiple of 60. For example, to specify a period of six minutes, use 360 as the period value. You can adjust how the data is aggregated by varying the length of the period. A period can be as short as one second or as long as one day (86,400 seconds). The default value is 60 seconds.

Only custom metrics that you define with a storage resolution of 1 second support sub-minute periods. Even though the option to set a period below 60 is always available in the console, you should select a period that aligns to how the metric is stored. For more information about metrics that support sub-minute periods, see High-resolution metrics.


As seen in the link above, if we don't set a custom metric with custom periods, AWS by default does not capture sub-minute data. So, the lowest resolution of data available is every 1 minute.

So, in our case, the network out data within 60 seconds is aggregated and captured as a single data point.

Even if I change the statistic to Average and the period to 1 second, it still shows every 1 minute data.

Now, if I divide 1.01 GB (shown by AWS) with 60, I get the per second data which is roughly around 16.8 MBps which is very close to the data shown by nmon or nload.

Raman Kishore
  • 542
  • 2
  • 12
0

From the AWS docs:

NetworkOut: The number of bytes sent out by the instance on all network interfaces. This metric identifies the volume of outgoing network traffic from a single instance. The number reported is the number of bytes sent during the period. If you are using basic (five-minute) monitoring, you can divide this number by 300 to find Bytes/second. If you have detailed (one-minute) monitoring, divide it by 60.

The NetworkOut graph in your case does not represent the current speed, it represents the number of bytes sent out by all network interfaces in the last 5 minutes. If my calculations are correct, we should get the following values:

1.01 GB ~= 1027 MB (reading from your graph)

To get the average speed for the last 5 minutes:

1027 MB / 300 = 3.42333 MB/s ~= 27.38 Mbits/s

It is still more than what you are expecting, although this is just an average for the last 5 minutes.

Ervin Szilagyi
  • 14,274
  • 2
  • 25
  • 40
  • Thanks for the answer, Ervin. I had thought the same and did similar calculation but still found the Cloudwatch metric to be way off from what nmon or nload shows. nload shows around 100+ Mbits/s average for this test. – Raman Kishore Jun 21 '21 at 16:07