High Redis latency in AWS (ElastiCache)

Question

I'm trying to determine the cause of some high latency I'm seeing on my ElastiCache Redis node (cache.m3.medium). I gathered some data using the redis-cli latency test, running it from an EC2 instance in the same region/availability-zone as the ElastiCache node.

I see that the latency is quite good on average (~.5ms), but that there are some pretty high outliers. I don't believe that the outliers are due to network latency, as network ping tests between two EC2 instances don't exhibit these high spikes.

The Redis node is not under any load, and the metrics seem to look fine.

My questions are:

What might be causing the high max latencies?
Are these max latencies expected?
What other steps/tests/tools would you use to further diagnose the issue?

.

user@my-ec2-instance:~/redis-3.2.8$ ./src/redis-cli -h redis-host --latency-history -i 1
min: 0, max: 12, avg: 0.45 (96 samples) -- 1.01 seconds range
min: 0, max: 1, avg: 0.33 (96 samples) -- 1.00 seconds range
min: 0, max: 3, avg: 0.33 (96 samples) -- 1.01 seconds range
min: 0, max: 2, avg: 0.29 (96 samples) -- 1.01 seconds range
min: 0, max: 2, avg: 0.26 (96 samples) -- 1.01 seconds range
min: 0, max: 1, avg: 0.34 (96 samples) -- 1.00 seconds range
min: 0, max: 4, avg: 0.34 (96 samples) -- 1.01 seconds range
min: 0, max: 1, avg: 0.26 (96 samples) -- 1.00 seconds range
min: 0, max: 5, avg: 0.33 (96 samples) -- 1.01 seconds range
min: 0, max: 1, avg: 0.31 (96 samples) -- 1.00 seconds range
min: 0, max: 1, avg: 0.33 (96 samples) -- 1.00 seconds range
min: 0, max: 1, avg: 0.28 (96 samples) -- 1.00 seconds range
min: 0, max: 1, avg: 0.30 (96 samples) -- 1.00 seconds range
min: 0, max: 4, avg: 0.35 (96 samples) -- 1.01 seconds range
min: 0, max: 15, avg: 0.52 (95 samples) -- 1.01 seconds range
min: 0, max: 4, avg: 0.48 (94 samples) -- 1.00 seconds range
min: 0, max: 2, avg: 0.54 (94 samples) -- 1.00 seconds range
min: 0, max: 1, avg: 0.38 (96 samples) -- 1.01 seconds range
min: 0, max: 8, avg: 0.55 (94 samples) -- 1.00 seconds range

Have experienced the same, it seems to be network latency, and have not been able to fix it.. — Tim Wachter, Jul 31 '17 at 13:01
I am having similar problem with Elasticache Memcached 1.4.34, but didn't have this issue with 1.4.14, so not sure if this is a network problem or specific version problem. My Elasticache CPU is ~2% and from time to time I can see even 4s spike ! — Tom Raganowicz, Sep 13 '18 at 10:42
How did you run this latency tests? Even i am suspecting this is affecting me but i need data to confirm — Bhargav, Oct 10 '19 at 18:01

score 9 · Accepted Answer · answered Dec 17 '17 at 02:35

9

I ran tests with several different node types, and found that bigger nodes performed much better. I'm using the cache.m3.xlarge type, which has provided more consistent network latency.

answered Dec 17 '17 at 02:35

Chris McBride

173
2
8

Curious, what are the latency numbers with xlarge? – beautifulcoder Jul 16 '23 at 15:26

High Redis latency in AWS (ElastiCache)

1 Answers1