How to improve latency of redis operations when redis is hosted as a single pod in K8s cluster

Question

I have a k8s cluster with application pods and a single redis pod in it. Some application pods will access the redis pod to set or get key-value pairs. Using simple SET and GET operation, no other complex operations. It is basically being used as a in-cluster centralised cache.

My application pods are .net services running on .NET 6.0 and Im using c# library StackExchange.Redis to connect to and access redis cache.

Now the issue Im facing is, the p99 latency of set and get operations is around 10ms which is way too high as the redis pod is given enough cpu and memory and so the application pod which is accessing the redis. During load test, both the pods cpu and memory usage is typically <=50% of requested resources. Both redis and application pods accessing redis are in same node pool. Even p95 is ~7ms.

My payload size is very small: Key size ~250 bytes and value size is also around ~250 bytes in my application.

Also, I have linkerd side-car proxy pods for mTLS and I understand that redis protocol is also mTLS'ed by default(Im using the default port 6379 and so no extra config is needed). Also the linkerd-side car proxy pods' cpu and memory usage is also well below requested resources.

Im deploying redis using helm chart (https://github.com/bitnami/charts/tree/main/bitnami/redis#parameters) and its is deployed in standalone architecture with only single master pod and also persistence disabled. It is deployed as a stateful set.

My k8s configuration: Im using Azure Kubernetes Service: Node VM type: Standard_D8s_v3 (8 core, 32 GB Ubunut VMs)

PS D:/>kubectl get services
NAME                   TYPE          CLUSTER-IP     EXTERNAL-IP    PORT(S)     AGE
redis-charts-headless  ClusterIP      None           <none>        6379/TCP   4d3h
redis-charts-master    ClusterIP      10.0.153.1     <none>        6379/TCP             4d3h

The below code is how Im initialising redis connection. The RedisCacheClient is added to the DI container as a singleton. It also has methods AddAsync(key, value, ttl) and GetAsync(key, value) which basically calls await this.redis.this.redis.GetDatabase().StringSetAsync() and getAsync respectively.

    public RedisCacheClient(
    IRedisCacheClientConfig redisCacheClientConfig,
    ILogger<RedisCacheClient> logger)
    {
        ArgumentUtility.CheckForNull(redisCacheClientConfig, nameof(redisCacheClientConfig));

        this.logger = logger;

        // For more details of configuration: https://stackexchange.github.io/StackExchange.Redis/Configuration#configuration-options.
        ConfigurationOptions configurationOptions = new ConfigurationOptions
        {
            EndPoints =
            {
                { "redis-charts-master", 6379 },
            },

            Password = redisCacheClientConfig.RedisCachePassword,

            // If true, will throw exception if connection fails (say redis pod is down) during object initialisation.
            // If false, will keep retrying.
            AbortOnConnectFail = false,

            // In milliseconds, same value used for async as well.
            // Fail fast, so that cache miss flow can be triggered.
            SyncTimeout = 1000,
        };

        this.redis = ConnectionMultiplexer.Connect(configurationOptions);
    }

I also checked intrinsic latency of my redis pod as said here https://redis.io/docs/management/optimization/latency/#latency-baseline:

I have no name!@redis-charts-master-0:/$ redis-cli --intrinsic-latency 100
Max latency so far: 1 microseconds.
Max latency so far: 3 microseconds.
Max latency so far: 24 microseconds.
Max latency so far: 54 microseconds.
Max latency so far: 66 microseconds.
Max latency so far: 145 microseconds.
Max latency so far: 230 microseconds.
Max latency so far: 243 microseconds.
Max latency so far: 260 microseconds.
Max latency so far: 312 microseconds.
Max latency so far: 431 microseconds.
Max latency so far: 1454 microseconds.
Max latency so far: 1882 microseconds.
Max latency so far: 2884 microseconds.
Max latency so far: 3638 microseconds.
Max latency so far: 4357 microseconds.
Max latency so far: 4922 microseconds.
Max latency so far: 9385 microseconds.

1649509850 total runs (avg latency: 0.0606 microseconds / 60.62 nanoseconds per run).
Worst run took 154806x longer than the average latency.

I also did redis benchmark like below and P99 is around 7-8 ms:

root@some_application_pod:/app#  redis-benchmark -h redis-charts-master -a 'PASSWORD_HERE' -c 10 -n 1000000 -d 2000 -r 10000  -t set,get -l

What else can I do to improve my latency or Am I doing something wrong or is this the best performance that I can derive out given that redis pod is containerised and run on top of a VM?

Removing Linkerd(disabling) side car container improved p99 Latency by 50% !!! from ~10ms to ~5ms. This is totally not expected, given the fact that even when linkerd was enabled, its cpu and memory usage was way less than the request resources and also that Redis port (6379) is already in default opaque ports list. — Bublu, Jun 14 '23 at 11:19

How to improve latency of redis operations when redis is hosted as a single pod in K8s cluster

0 Answers0