4

After migration to GCP and Cloud Memorystore, we have been getting intermittent Redis timeouts.

We are connecting from Java Spring applications running in GKE. Prior to migrating to GCP we did not have these issues.

Some key differences from old environment:

  • we consolidated 4 Redis instances into 2
  • we were running our own Redis servers as opposed to managed (Memorystore)

Key stats:

  • one instance gets ~5,000 calls/second on average
  • the other instance gets ~1,000 calls/second

We are using:

        <dependency>
            <groupId>org.springframework.data</groupId>
            <artifactId>spring-data-redis</artifactId>
            <version>2.1.10.RELEASE</version>
        </dependency>
        <dependency>
            <groupId>redis.clients</groupId>
            <artifactId>jedis</artifactId>
            <version>2.9.3</version>
        </dependency>

We are using the following code to instantiate the connections:

    public JedisConnectionFactory getJedisConnectionFactoryOne()
    {
        JedisPoolConfig jedisPoolConfig = new JedisPoolConfig();
        jedisPoolConfig.setMaxTotal(MAX_TOTAL);
        jedisPoolConfig.setMaxIdle(MAX_IDLE);
        jedisPoolConfig.setMinIdle(MIN_IDLE);
        JedisConnectionFactory jedisConnectionFactory = new JedisConnectionFactory(jedisPoolConfig);
        jedisConnectionFactory.setHostName(REDIS_HOST_1);
        jedisConnectionFactory.setPort(REDIS_PORT);
        return jedisConnectionFactory;
    }

We have tried adjusting the pool size, but it didn't help.

We get these exceptions intermittently throughout the day:

05-Sep-2019 08:18:16.597 SEVERE [http-nio-8080-exec-331] org.apache.catalina.core.StandardWrapperValve.invoke Servlet.service() for servlet [appServlet] in context with path [] threw exception [Request processing failed; 
nested exception is org.springframework.data.redis.RedisConnectionFailureException: java.net.SocketTimeoutException: Read timed out; 
nested exception is redis.clients.jedis.exceptions.JedisConnectionException: java.net.SocketTimeoutException: Read timed out] with root cause

    java.net.SocketTimeoutException: Read timed out
        at java.net.SocketInputStream.socketRead0(Native Method)
        at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
        at java.net.SocketInputStream.read(SocketInputStream.java:171)
        at java.net.SocketInputStream.read(SocketInputStream.java:141)
        at java.net.SocketInputStream.read(SocketInputStream.java:127)
        at redis.clients.util.RedisInputStream.ensureFill(RedisInputStream.java:196)
        at redis.clients.util.RedisInputStream.readByte(RedisInputStream.java:40)
        at redis.clients.jedis.Protocol.process(Protocol.java:153)
        at redis.clients.jedis.Protocol.read(Protocol.java:218)
        at redis.clients.jedis.Connection.readProtocolWithCheckingBroken(Connection.java:341)
        at redis.clients.jedis.Connection.getBinaryBulkReply(Connection.java:260)
        at redis.clients.jedis.BinaryJedis.get(BinaryJedis.java:246)
        at org.springframework.data.redis.connection.jedis.JedisStringCommands.get(JedisStringCommands.java:66)
        at org.springframework.data.redis.connection.DefaultedRedisConnection.get(DefaultedRedisConnection.java:253)
        at org.springframework.data.redis.core.DefaultValueOperations$1.inRedis(DefaultValueOperations.java:57)
        at org.springframework.data.redis.core.AbstractOperations$ValueDeserializingRedisCallback.doInRedis(AbstractOperations.java:59)
        at org.springframework.data.redis.core.RedisTemplate.execute(RedisTemplate.java:224)
        at org.springframework.data.redis.core.RedisTemplate.execute(RedisTemplate.java:184)
        at org.springframework.data.redis.core.AbstractOperations.execute(AbstractOperations.java:95)
        at org.springframework.data.redis.core.DefaultValueOperations.get(DefaultValueOperations.java:53)
        at com.ranker.app.dao.impl.redis.tag.RedisTagDAO.getTag(RedisTagDAO.java:26)
        at com.ranker.app.service.redis.tag.RedisTagService.getTag(RedisTagService.java:20)
        at com.ranker.app.service.tag.TagService.getTag(TagService.java:48)
        at com.ranker.api.service.tag.TagApiService.getTag(TagApiService.java:113)
        at com.ranker.api.service.collection.CollectionListApiService.mapListToCollectionList(CollectionListApiService.java:171)
        at com.ranker.api.service.collection.CollectionListApiService.getCollectionListsFromMappings(CollectionListApiService.java:151)
        at com.ranker.api.service.collection.CollectionListApiService.getListCollectionInfo(CollectionListApiService.java:124)
        at com.ranker.api.service.list.ListPageApiService.getListCollectionInfo(ListPageApiService.java:355)
        at com.ranker.api.service.list.ListPageApiService.getListPageModel(ListPageApiService.java:264)
        at com.ranker.api.controller.ListPageController.getPageModel(ListPageController.java:58)
        at sun.reflect.GeneratedMethodAccessor1976.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.springframework.web.method.support.InvocableHandlerMethod.doInvoke(InvocableHandlerMethod.java:189)
        at org.springframework.web.method.support.InvocableHandlerMethod.invokeForRequest(InvocableHandlerMethod.java:138)
        at org.springframework.web.servlet.mvc.method.annotation.ServletInvocableHandlerMethod.invokeAndHandle(ServletInvocableHandlerMethod.java:102)
        at org.springframework.web.servlet.mvc.method.annotation.RequestMappingHandlerAdapter.invokeHandlerMethod(RequestMappingHandlerAdapter.java:895)
        at org.springframework.web.servlet.mvc.method.annotation.RequestMappingHandlerAdapter.handleInternal(RequestMappingHandlerAdapter.java:800)
        at org.springframework.web.servlet.mvc.method.AbstractHandlerMethodAdapter.handle(AbstractHandlerMethodAdapter.java:87)
        at org.springframework.web.servlet.DispatcherServlet.doDispatch(DispatcherServlet.java:1038)
        at org.springframework.web.servlet.DispatcherServlet.doService(DispatcherServlet.java:942)
        at org.springframework.web.servlet.FrameworkServlet.processRequest(FrameworkServlet.java:1005)
        at org.springframework.web.servlet.FrameworkServlet.doPost(FrameworkServlet.java:908)
        at javax.servlet.http.HttpServlet.service(HttpServlet.java:661)
        at org.springframework.web.servlet.FrameworkServlet.service(FrameworkServlet.java:882)
        at javax.servlet.http.HttpServlet.service(HttpServlet.java:742)
        at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:231)
        at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166)
        at org.apache.tomcat.websocket.server.WsFilter.doFilter(WsFilter.java:52)
        at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193)
        at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166)
        at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:316)
        at org.springframework.security.web.access.intercept.FilterSecurityInterceptor.invoke(FilterSecurityInterceptor.java:126)
        at org.springframework.security.web.access.intercept.FilterSecurityInterceptor.doFilter(FilterSecurityInterceptor.java:90)
        at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:330)
        at org.springframework.security.web.access.ExceptionTranslationFilter.doFilter(ExceptionTranslationFilter.java:114)
        at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:330)
        at org.springframework.security.web.authentication.AnonymousAuthenticationFilter.doFilter(AnonymousAuthenticationFilter.java:111)
        at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:330)
        at org.springframework.security.web.servletapi.SecurityContextHolderAwareRequestFilter.doFilter(SecurityContextHolderAwareRequestFilter.java:169)
        at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:330)
        at com.ranker.api.security.AuthenticationTokenProcessingFilter.doFilter(AuthenticationTokenProcessingFilter.java:149)
        at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:330)
        at org.springframework.security.web.header.HeaderWriterFilter.doFilterInternal(HeaderWriterFilter.java:64)
        at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:107)
        at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:330)
        at org.springframework.security.web.context.request.async.WebAsyncManagerIntegrationFilter.doFilterInternal(WebAsyncManagerIntegrationFilter.java:53)
        at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:107)
        at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:330)
        at org.springframework.security.web.context.SecurityContextPersistenceFilter.doFilter(SecurityContextPersistenceFilter.java:91)
        at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:330)
        at org.springframework.security.web.FilterChainProxy.doFilterInternal(FilterChainProxy.java:213)
        at org.springframework.security.web.FilterChainProxy.doFilter(FilterChainProxy.java:176)
        at org.springframework.web.filter.DelegatingFilterProxy.invokeDelegate(DelegatingFilterProxy.java:357)
        at org.springframework.web.filter.DelegatingFilterProxy.doFilter(DelegatingFilterProxy.java:270)
        at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193)
        at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166)
        at com.ranker.api.web.core.filter.CorsFilter.doFilterInternal(CorsFilter.java:32)
        at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:107)
        at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193)
        at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166)
        at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:199)
        at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:96)
        at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:493)
        at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:137)
        at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:81)
        at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:87)
        at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:343)
        at org.apache.coyote.http11.Http11Processor.service(Http11Processor.java:798)
        at org.apache.coyote.AbstractProcessorLight.process(AbstractProcessorLight.java:66)
        at org.apache.coyote.AbstractProtocol$ConnectionHandler.process(AbstractProtocol.java:808)
        at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.doRun(NioEndpoint.java:1498)
        at org.apache.tomcat.util.net.SocketProcessorBase.run(SocketProcessorBase.java:49)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
        at java.lang.Thread.run(Thread.java:748)```
  • 1
    Did you get a chance to monitor the cpu and memory utilization(used memory) of the redis server. Because you are now having only 2 redis instance as opposed to 4 in the old setup. Also please the check the network bandwidth and network utilization between application server and redis. Compare this with old configuration. – Vinoth A Sep 07 '19 at 13:12
  • CPU never goes over ~20% on the higher usage instance and ~10% on the lower usage instance. Memory utilization is 60%-70% on both. I can't really compare to old configuration as those have been terminated already and our monitoring tools for those are gone as well. Current network utilization is ~5MB/s outgoing and ~1.5MB/s incoming on 1 instance, ~10MB/s outgoing and ~1MB/s incoming on the other instance. – Rafi Greenberg Sep 07 '19 at 14:10
  • It seems the default timeout for Redis client is 2 seconds, which should be more than enough, but does anyone think we need to adjust this on the client side? – Rafi Greenberg Sep 07 '19 at 19:25
  • 1
    Can you have a look at the maxmemory policies [here](https://cloud.google.com/memorystore/docs/reference/redis-configs#maxmemory_policies) i'm thinking that depending on the load of the memory it might time out sometimes due to spikes. I think the default is "volatile-lru". Have a look and let me know if it works for you, nevertheless i think that in order to better examine your exact setup you should maybe open a case with Google Cloud Platform Support. – ericcco Sep 09 '19 at 10:09
  • @eespinola Thanks, we are looking into this. We do have a case open with GCP support already, just hoping to get further insights. – Rafi Greenberg Sep 09 '19 at 21:30
  • @eespinola we are only using 60%-70% of available memory, so I don't think the maxmemory policy is relevant. – Rafi Greenberg Sep 09 '19 at 23:00

2 Answers2

1

Changing the timeout in the client from the default (2 seconds) to 5 seconds seems to have stopped the errors completely.

jedisConnectionFactory.setTimeout(5000);

It's strange because from our APM monitoring it shows p95 redis calls are < 20 μs, p99 < 20ms, and max latency < 500ms. So I'm not entirely sure why we were getting so many of these timeout errors in the first place.

We will be upgrading to spring-data-redis 2.2.x (jedis 3.x) once it is out of snapshot and in release.

0

I actually had this happen to me 9 days after a deployment using CloudRun. Every 20 minutes a couple of those errors were happening, and I had about 10-30 req/s to the redis instance.

What I did to solve it (99% solved, it rarely disconnects now) was just re-deploy my cloud run container.

Here are some visual logs to see what was going on:

enter image description here

PS: Also, my default timeout was set to 60 seconds, so the other answer didn't help me.

Kevin Danikowski
  • 4,620
  • 6
  • 41
  • 75