1

I am trying to use sentinal redis to get/set keys from redis. I was trying to stress test my setup with about 2000 concurrent requests.

i used sentinel to put a single key on redis and then I executed 1000 concurrent get requests from redis.

But the underlying jedis used my sentinel is blocking call on getResource() (pool size is 500) and the overall average response time that I am achieving is around 500 ms, but my target was about 10 ms.

I am attaching sample of jvisualvm snapshot here

redis.clients.jedis.JedisSentinelPool.getResource() 98.02227    4.0845232601E7 ms   4779
redis.clients.jedis.BinaryJedis.get()   1.6894469   703981.381 ms   141
org.apache.catalina.core.ApplicationFilterChain.doFilter()  0.12820946  53424.035 ms    6875
org.springframework.core.serializer.support.DeserializingConverter.convert()    0.046286926 19287.457 ms    4
redis.clients.jedis.JedisSentinelPool.returnResource()  0.04444578  18520.263 ms    4
org.springframework.aop.framework.CglibAopProxy$DynamicAdvisedInterceptor.intercept()   0.035538    14808.45 ms 11430

May anyone help to debug further into the issue?

Global Warrior
  • 5,050
  • 9
  • 45
  • 75

1 Answers1

1

From JedisSentinelPool implementation of getResource() from Jedis sources (2.6.2):

@Override
public Jedis getResource() {
    while (true) {
      Jedis jedis = super.getResource();
      jedis.setDataSource(this);

      // get a reference because it can change concurrently
      final HostAndPort master = currentHostMaster;
      final HostAndPort connection = new HostAndPort(jedis.getClient().getHost(), jedis.getClient()
          .getPort());

      if (master.equals(connection)) {
        // connected to the correct master
        return jedis;
      } else {
        returnBrokenResource(jedis);
      }
    }
}

Note the while(true) and the returnBrokenResource(jedis), it means that it tries to get a jedis resource randomly from the pool that is indeed connected to the correct master and retries if it is not the good one. It is a dirty check and also a blocking call.

The super.getResource() call refers to JedisPool traditionnal implementation that is actually based on Apache Commons Pool (2.0). It does a lot to get an object from the pool, and I think it even repairs fail connections for instance. With a lot of contention on your pool, as probably in your stress test, it can probably take a lot of time to get a resource from the pool, just to see it is not connected to the correct master, so you end up calling it again, adding contention, slowing getting the resource etc...

You should check all the jedis instances in your pool to see if there's a lot of 'bad' connections.

Maybe you should give up using a common pool for your stress test (only create Jedis instances manually connected to the correct node, and close them nicely), or setting multiple ones to mitigate the cost of looking to "dirty" unchecked jedis resources.

Also with a pool of 500 jedis instances, you can't emulate 1000 concurrent queries, you need at least 1000.

zenbeni
  • 7,019
  • 3
  • 29
  • 60