3

I have a redis cluster with 3 shards. Each shard has 2 nodes, 1 primary and 1 replica. I'm using spring-boot 2.0.1. Final and following is the configuration and code im using to create redis cluster connect.

pom.xml:

<dependencies>
        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter-data-redis</artifactId>
        </dependency>
<dependencies>

application.yml:

redis:
    cluster:
      nodes: 172.18.0.155:7010,172.18.0.155:7011,172.18.0.155:7012,172.18.0.156:7020,172.18.0.156:7021,172.18.0.156:7022
      max-redirects: 3
    timeout: 5000
    lettuce:
      pool:
        max-active: 200
        max-idle: 8
        min-idle: 0
        max-wait: 1000
    database: 0

RedisConfig.java:

package com.central.redis.config;

import org.springframework.boot.autoconfigure.condition.ConditionalOnProperty;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
import org.springframework.context.annotation.Primary;
import org.springframework.data.redis.connection.RedisConnectionFactory;
import org.springframework.data.redis.core.RedisTemplate;
import org.springframework.data.redis.serializer.RedisSerializer;
import org.springframework.data.redis.serializer.StringRedisSerializer;

import com.central.redis.config.util.RedisObjectSerializer;

@Configuration
public class RedisConfig {

    @Primary
    @Bean("redisTemplate")
    @ConditionalOnProperty(name = "spring.redis.cluster.nodes", matchIfMissing = false)
    public RedisTemplate<String, Object> getRedisTemplate(RedisConnectionFactory factory) {
        RedisTemplate<String, Object> redisTemplate = new RedisTemplate<String, Object>();
        redisTemplate.setConnectionFactory(factory);

        RedisSerializer stringSerializer = new StringRedisSerializer();
        // RedisSerializer redisObjectSerializer = new RedisObjectSerializer();
        RedisSerializer redisObjectSerializer = new RedisObjectSerializer();
        redisTemplate.setKeySerializer(stringSerializer); 
        redisTemplate.setHashKeySerializer(stringSerializer);
        redisTemplate.setValueSerializer(redisObjectSerializer); 
        redisTemplate.afterPropertiesSet();

        redisTemplate.opsForValue().set("hello", "wolrd");
        return redisTemplate;
    }

    @Primary
    @Bean("redisTemplate")
    @ConditionalOnProperty(name = "spring.redis.host", matchIfMissing = true)
    public RedisTemplate<String, Object> getSingleRedisTemplate(RedisConnectionFactory factory) {
        RedisTemplate<String, Object> redisTemplate = new RedisTemplate<String, Object>();
        redisTemplate.setConnectionFactory(factory);
        redisTemplate.setKeySerializer(new StringRedisSerializer()); 
        redisTemplate.setValueSerializer(new RedisObjectSerializer()); 
        redisTemplate.afterPropertiesSet();
        return redisTemplate;
    }

}

We recently had an issue where one of the primary nodes in a shard of the cluster had problem, which triggered failover. So the shard had two nodes 001 (primary) and 002 (replica). 001 failed-over, and 002 became primary.

And then, the app will try to reconnect the failed nodes. And it caused the redis cluster visit to failed. My assumption was that even if one node was failed, it should automatically refreshed the topology and connect the new master node. But it didn't.

Here is the logs:

2018-11-03 17:46:21.992 [main] INFO  org.springframework.jmx.export.annotation.AnnotationMBeanExporter - Located managed bean 'environmentManager': registering with JMX server as MBean [org.springframework.cloud.context.environment:name=environmentManager,type=EnvironmentManager]
2018-11-03 17:46:22.015 [main] INFO  org.springframework.jmx.export.annotation.AnnotationMBeanExporter - Located MBean 'dataSourceLog': registering with JMX server as MBean [com.alibaba.druid.spring.boot.autoconfigure:name=dataSourceLog,type=DruidDataSourceWrapper]
2018-11-03 17:46:22.022 [main] INFO  org.springframework.jmx.export.annotation.AnnotationMBeanExporter - Located managed bean 'refreshScope': registering with JMX server as MBean [org.springframework.cloud.context.scope.refresh:name=refreshScope,type=RefreshScope]
2018-11-03 17:46:22.055 [main] INFO  org.springframework.jmx.export.annotation.AnnotationMBeanExporter - Located managed bean 'configurationPropertiesRebinder': registering with JMX server as MBean [org.springframework.cloud.context.properties:name=configurationPropertiesRebinder,context=70a9f84e,type=ConfigurationPropertiesRebinder]
2018-11-03 17:46:22.075 [main] INFO  org.springframework.jmx.export.annotation.AnnotationMBeanExporter - Located MBean 'dataSourceCore': registering with JMX server as MBean [com.alibaba.druid.spring.boot.autoconfigure:name=dataSourceCore,type=DruidDataSourceWrapper]
2018-11-03 17:46:22.078 [main] INFO  org.springframework.jmx.export.annotation.AnnotationMBeanExporter - Located MBean 'statFilter': registering with JMX server as MBean [com.alibaba.druid.filter.stat:name=statFilter,type=StatFilter]
2018-11-03 17:46:22.101 [main] INFO  org.springframework.context.support.DefaultLifecycleProcessor - Starting beans in phase 0
2018-11-03 17:46:22.136 [main] INFO  org.springframework.cloud.netflix.eureka.InstanceInfoFactory - Setting initial instance status as: STARTING
2018-11-03 17:46:22.203 [main] INFO  com.netflix.discovery.DiscoveryClient - Initializing Eureka in region us-east-1
2018-11-03 17:46:22.317 [main] INFO  com.netflix.discovery.provider.DiscoveryJerseyProvider - Using JSON encoding codec LegacyJacksonJson
2018-11-03 17:46:22.317 [main] INFO  com.netflix.discovery.provider.DiscoveryJerseyProvider - Using JSON decoding codec LegacyJacksonJson
2018-11-03 17:46:22.557 [main] INFO  com.netflix.discovery.provider.DiscoveryJerseyProvider - Using XML encoding codec XStreamXml
2018-11-03 17:46:22.558 [main] INFO  com.netflix.discovery.provider.DiscoveryJerseyProvider - Using XML decoding codec XStreamXml
2018-11-03 17:46:23.144 [main] INFO  com.netflix.discovery.shared.resolver.aws.ConfigClusterResolver - Resolving eureka endpoints via configuration
2018-11-03 17:46:23.188 [main] INFO  com.netflix.discovery.DiscoveryClient - Disable delta property : false
2018-11-03 17:46:23.189 [main] INFO  com.netflix.discovery.DiscoveryClient - Single vip registry refresh property : null
2018-11-03 17:46:23.189 [main] INFO  com.netflix.discovery.DiscoveryClient - Force full registry fetch : false
2018-11-03 17:46:23.189 [main] INFO  com.netflix.discovery.DiscoveryClient - Application is null : false
2018-11-03 17:46:23.189 [main] INFO  com.netflix.discovery.DiscoveryClient - Registered Applications size is zero : true
2018-11-03 17:46:23.189 [main] INFO  com.netflix.discovery.DiscoveryClient - Application version is -1: true
2018-11-03 17:46:23.189 [main] INFO  com.netflix.discovery.DiscoveryClient - Getting all instance registry info from the eureka server
2018-11-03 17:46:23.578 [main] INFO  com.netflix.discovery.DiscoveryClient - The response status is 200
2018-11-03 17:46:23.587 [main] INFO  com.netflix.discovery.DiscoveryClient - Starting heartbeat executor: renew interval is: 10
2018-11-03 17:46:23.596 [main] INFO  com.netflix.discovery.InstanceInfoReplicator - InstanceInfoReplicator onDemand update allowed rate per min is 4
2018-11-03 17:46:23.602 [main] INFO  com.netflix.discovery.DiscoveryClient - Discovery Client initialized at timestamp 1541238383601 with initial instances count: 7
2018-11-03 17:46:23.622 [main] INFO  org.springframework.cloud.netflix.eureka.serviceregistry.EurekaServiceRegistry - Registering application AUTH-SERVER with eureka with status UP
2018-11-03 17:46:23.624 [main] INFO  com.netflix.discovery.DiscoveryClient - Saw local status change event StatusChangeEvent [timestamp=1541238383623, current=UP, previous=STARTING]
2018-11-03 17:46:23.634 [DiscoveryClient-InstanceInfoReplicator-0] INFO  com.netflix.discovery.DiscoveryClient - DiscoveryClient_AUTH-SERVER/auth-server:172.18.0.153:8000 : registering service...
2018-11-03 17:46:23.636 [main] INFO  org.springframework.context.support.DefaultLifecycleProcessor - Starting beans in phase 2147483647
2018-11-03 17:46:23.637 [main] INFO  springfox.documentation.spring.web.plugins.DocumentationPluginsBootstrapper - Context refreshed
2018-11-03 17:46:23.706 [main] INFO  springfox.documentation.spring.web.plugins.DocumentationPluginsBootstrapper - Found 1 custom documentation plugin(s)
2018-11-03 17:46:23.720 [DiscoveryClient-InstanceInfoReplicator-0] INFO  com.netflix.discovery.DiscoveryClient - DiscoveryClient_AUTH-SERVER/auth-server:172.18.0.153:8000  - registration status: 204
2018-11-03 17:46:23.908 [DiscoveryClient-InstanceInfoReplicator-0] INFO  com.alibaba.druid.pool.DruidDataSource - {dataSource-1} inited
2018-11-03 17:46:23.940 [main] INFO  springfox.documentation.spring.web.scanners.ApiListingReferenceScanner - Scanning for api listing references
2018-11-03 17:46:24.391 [main] INFO  springfox.documentation.spring.web.readers.operation.CachingOperationNameGenerator - Generating unique operation named: rolesUsingGET_1
2018-11-03 17:46:24.525 [main] INFO  springfox.documentation.spring.web.readers.operation.CachingOperationNameGenerator - Generating unique operation named: getUserTokenInfoUsingPOST_1
2018-11-03 17:46:24.559 [main] INFO  springfox.documentation.spring.web.readers.operation.CachingOperationNameGenerator - Generating unique operation named: deleteUsingDELETE_1
2018-11-03 17:46:24.578 [main] INFO  springfox.documentation.spring.web.readers.operation.CachingOperationNameGenerator - Generating unique operation named: saveOrUpdateUsingPOST_1
2018-11-03 17:46:24.893 [DiscoveryClient-InstanceInfoReplicator-0] INFO  com.alibaba.druid.pool.DruidDataSource - {dataSource-2} inited
2018-11-03 17:46:24.947 [main] INFO  org.springframework.scheduling.annotation.ScheduledAnnotationBeanPostProcessor - No TaskScheduler/ScheduledExecutorService bean found for scheduled processing
2018-11-03 17:46:24.969 [main] INFO  org.apache.coyote.http11.Http11NioProtocol - Starting ProtocolHandler ["http-nio-8000"]
2018-11-03 17:46:24.971 [main] INFO  org.apache.tomcat.util.net.NioSelectorPool - Using a shared selector for servlet write/read
2018-11-03 17:46:25.027 [main] INFO  org.springframework.boot.web.embedded.tomcat.TomcatWebServer - Tomcat started on port(s): 8000 (http) with context path ''
2018-11-03 17:46:25.029 [main] INFO  org.springframework.cloud.netflix.eureka.serviceregistry.EurekaAutoServiceRegistration - Updating port to 8000
2018-11-03 17:46:25.034 [main] INFO  com.central.OpenAuthServerApp - Started OpenAuthServerApp in 31.95 seconds (JVM running for 33.267)
2018-11-03 17:48:47.239 [lettuce-eventExecutorLoop-1-1] INFO  io.lettuce.core.protocol.ConnectionWatchdog - Reconnecting, last destination was /172.18.0.156:7022
2018-11-03 17:48:47.239 [lettuce-eventExecutorLoop-1-2] INFO  io.lettuce.core.protocol.ConnectionWatchdog - Reconnecting, last destination was /172.18.0.156:7022
2018-11-03 17:48:56.236 [lettuce-eventExecutorLoop-1-2] INFO  io.lettuce.core.protocol.ConnectionWatchdog - Reconnecting, last destination was 172.18.0.156:7022
2018-11-03 17:48:56.236 [lettuce-eventExecutorLoop-1-1] INFO  io.lettuce.core.protocol.ConnectionWatchdog - Reconnecting, last destination was 172.18.0.156:7022
2018-11-03 17:49:04.436 [lettuce-eventExecutorLoop-1-3] INFO  io.lettuce.core.protocol.ConnectionWatchdog - Reconnecting, last destination was 172.18.0.156:7022
2018-11-03 17:49:04.436 [lettuce-eventExecutorLoop-1-2] INFO  io.lettuce.core.protocol.ConnectionWatchdog - Reconnecting, last destination was 172.18.0.156:7022
2018-11-03 17:49:20.835 [lettuce-eventExecutorLoop-1-1] INFO  io.lettuce.core.protocol.ConnectionWatchdog - Reconnecting, last destination was 172.18.0.156:7022
2018-11-03 17:49:20.835 [lettuce-eventExecutorLoop-1-3] INFO  io.lettuce.core.protocol.ConnectionWatchdog - Reconnecting, last destination was 172.18.0.156:7022
2018-11-03 17:49:50.935 [lettuce-eventExecutorLoop-1-2] INFO  io.lettuce.core.protocol.ConnectionWatchdog - Reconnecting, last destination was 172.18.0.156:7022
2018-11-03 17:49:50.935 [lettuce-eventExecutorLoop-1-1] INFO  io.lettuce.core.protocol.ConnectionWatchdog - Reconnecting, last destination was 172.18.0.156:7022
2018-11-03 17:50:21.035 [lettuce-eventExecutorLoop-1-3] INFO  io.lettuce.core.protocol.ConnectionWatchdog - Reconnecting, last destination was 172.18.0.156:7022
2018-11-03 17:50:21.037 [lettuce-eventExecutorLoop-1-2] INFO  io.lettuce.core.protocol.ConnectionWatchdog - Reconnecting, last destination was 172.18.0.156:7022
2018-11-03 17:50:51.135 [lettuce-eventExecutorLoop-1-1] INFO  io.lettuce.core.protocol.ConnectionWatchdog - Reconnecting, last destination was 172.18.0.156:7022
2018-11-03 17:50:51.135 [lettuce-eventExecutorLoop-1-3] INFO  io.lettuce.core.protocol.ConnectionWatchdog - Reconnecting, last destination was 172.18.0.156:7022
2018-11-03 17:51:21.235 [lettuce-eventExecutorLoop-1-1] INFO  io.lettuce.core.protocol.ConnectionWatchdog - Reconnecting, last destination was 172.18.0.156:7022
2018-11-03 17:51:21.236 [lettuce-eventExecutorLoop-1-2] INFO  io.lettuce.core.protocol.ConnectionWatchdog - Reconnecting, last destination was 172.18.0.156:7022
2018-11-03 17:51:23.191 [AsyncResolver-bootstrap-executor-0] INFO  com.netflix.discovery.shared.resolver.aws.ConfigClusterResolver - Resolving eureka endpoints via configuration
2018-11-03 17:51:51.335 [lettuce-eventExecutorLoop-1-3] INFO  io.lettuce.core.protocol.ConnectionWatchdog - Reconnecting, last destination was 172.18.0.156:7022
2018-11-03 17:51:51.335 [lettuce-eventExecutorLoop-1-2] INFO  io.lettuce.core.protocol.ConnectionWatchdog - Reconnecting, last destination was 172.18.0.156:7022
2018-11-03 17:52:21.435 [lettuce-eventExecutorLoop-1-1] INFO  io.lettuce.core.protocol.ConnectionWatchdog - Reconnecting, last destination was 172.18.0.156:7022
2018-11-03 17:52:21.435 [lettuce-eventExecutorLoop-1-3] INFO  io.lettuce.core.protocol.ConnectionWatchdog - Reconnecting, last destination was 172.18.0.156:7022
2018-11-03 17:52:51.535 [lettuce-eventExecutorLoop-1-1] INFO  io.lettuce.core.protocol.ConnectionWatchdog - Reconnecting, last destination was 172.18.0.156:7022
2018-11-03 17:52:51.535 [lettuce-eventExecutorLoop-1-2] INFO  io.lettuce.core.protocol.ConnectionWatchdog - Reconnecting, last destination was 172.18.0.156:7022

As you can see, there are not any error logs here. Just some lettuce ConnectionWatchdog reconnecting info, when one primary node failed. I know the reconnect behaviou is normal. But why it could affect access to the redis cluster?

Has anyone met this problem before? Did I miss any important things?

Kaol
  • 431
  • 2
  • 7
  • 16

0 Answers0