6

I'm trying to determine the cause of some high latency I'm seeing on my ElastiCache Redis node (cache.m3.medium). I gathered some data using the redis-cli latency test, running it from an EC2 instance in the same region/availability-zone as the ElastiCache node.

I see that the latency is quite good on average (~.5ms), but that there are some pretty high outliers. I don't believe that the outliers are due to network latency, as network ping tests between two EC2 instances don't exhibit these high spikes.

The Redis node is not under any load, and the metrics seem to look fine.

My questions are:

  1. What might be causing the high max latencies?
  2. Are these max latencies expected?
  3. What other steps/tests/tools would you use to further diagnose the issue?

.

user@my-ec2-instance:~/redis-3.2.8$ ./src/redis-cli -h redis-host --latency-history -i 1
min: 0, max: 12, avg: 0.45 (96 samples) -- 1.01 seconds range
min: 0, max: 1, avg: 0.33 (96 samples) -- 1.00 seconds range
min: 0, max: 3, avg: 0.33 (96 samples) -- 1.01 seconds range
min: 0, max: 2, avg: 0.29 (96 samples) -- 1.01 seconds range
min: 0, max: 2, avg: 0.26 (96 samples) -- 1.01 seconds range
min: 0, max: 1, avg: 0.34 (96 samples) -- 1.00 seconds range
min: 0, max: 4, avg: 0.34 (96 samples) -- 1.01 seconds range
min: 0, max: 1, avg: 0.26 (96 samples) -- 1.00 seconds range
min: 0, max: 5, avg: 0.33 (96 samples) -- 1.01 seconds range
min: 0, max: 1, avg: 0.31 (96 samples) -- 1.00 seconds range
min: 0, max: 1, avg: 0.33 (96 samples) -- 1.00 seconds range
min: 0, max: 1, avg: 0.28 (96 samples) -- 1.00 seconds range
min: 0, max: 1, avg: 0.30 (96 samples) -- 1.00 seconds range
min: 0, max: 4, avg: 0.35 (96 samples) -- 1.01 seconds range
min: 0, max: 15, avg: 0.52 (95 samples) -- 1.01 seconds range
min: 0, max: 4, avg: 0.48 (94 samples) -- 1.00 seconds range
min: 0, max: 2, avg: 0.54 (94 samples) -- 1.00 seconds range
min: 0, max: 1, avg: 0.38 (96 samples) -- 1.01 seconds range
min: 0, max: 8, avg: 0.55 (94 samples) -- 1.00 seconds range
Chris McBride
  • 173
  • 2
  • 8
  • 1
    Have experienced the same, it seems to be network latency, and have not been able to fix it.. – Tim Wachter Jul 31 '17 at 13:01
  • i have exactly the same issue, have you found the reason? – m1cha3l Dec 06 '17 at 09:57
  • I am having similar problem with Elasticache Memcached 1.4.34, but didn't have this issue with 1.4.14, so not sure if this is a network problem or specific version problem. My Elasticache CPU is ~2% and from time to time I can see even 4s spike ! – Tom Raganowicz Sep 13 '18 at 10:42
  • How did you run this latency tests? Even i am suspecting this is affecting me but i need data to confirm – Bhargav Oct 10 '19 at 18:01

1 Answers1

9

I ran tests with several different node types, and found that bigger nodes performed much better. I'm using the cache.m3.xlarge type, which has provided more consistent network latency.

Chris McBride
  • 173
  • 2
  • 8