I have a redis cluster with 3 shards. Each shard has 2 nodes, 1 primary and 1 replica. I'm using spring-boot 2.0.1. Final and following is the configuration and code im using to create redis cluster connect.
pom.xml:
<dependencies>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-data-redis</artifactId>
</dependency>
<dependencies>
application.yml:
redis:
cluster:
nodes: 172.18.0.155:7010,172.18.0.155:7011,172.18.0.155:7012,172.18.0.156:7020,172.18.0.156:7021,172.18.0.156:7022
max-redirects: 3
timeout: 5000
lettuce:
pool:
max-active: 200
max-idle: 8
min-idle: 0
max-wait: 1000
database: 0
RedisConfig.java:
package com.central.redis.config;
import org.springframework.boot.autoconfigure.condition.ConditionalOnProperty;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
import org.springframework.context.annotation.Primary;
import org.springframework.data.redis.connection.RedisConnectionFactory;
import org.springframework.data.redis.core.RedisTemplate;
import org.springframework.data.redis.serializer.RedisSerializer;
import org.springframework.data.redis.serializer.StringRedisSerializer;
import com.central.redis.config.util.RedisObjectSerializer;
@Configuration
public class RedisConfig {
@Primary
@Bean("redisTemplate")
@ConditionalOnProperty(name = "spring.redis.cluster.nodes", matchIfMissing = false)
public RedisTemplate<String, Object> getRedisTemplate(RedisConnectionFactory factory) {
RedisTemplate<String, Object> redisTemplate = new RedisTemplate<String, Object>();
redisTemplate.setConnectionFactory(factory);
RedisSerializer stringSerializer = new StringRedisSerializer();
// RedisSerializer redisObjectSerializer = new RedisObjectSerializer();
RedisSerializer redisObjectSerializer = new RedisObjectSerializer();
redisTemplate.setKeySerializer(stringSerializer);
redisTemplate.setHashKeySerializer(stringSerializer);
redisTemplate.setValueSerializer(redisObjectSerializer);
redisTemplate.afterPropertiesSet();
redisTemplate.opsForValue().set("hello", "wolrd");
return redisTemplate;
}
@Primary
@Bean("redisTemplate")
@ConditionalOnProperty(name = "spring.redis.host", matchIfMissing = true)
public RedisTemplate<String, Object> getSingleRedisTemplate(RedisConnectionFactory factory) {
RedisTemplate<String, Object> redisTemplate = new RedisTemplate<String, Object>();
redisTemplate.setConnectionFactory(factory);
redisTemplate.setKeySerializer(new StringRedisSerializer());
redisTemplate.setValueSerializer(new RedisObjectSerializer());
redisTemplate.afterPropertiesSet();
return redisTemplate;
}
}
We recently had an issue where one of the primary nodes in a shard of the cluster had problem, which triggered failover. So the shard had two nodes 001 (primary) and 002 (replica). 001 failed-over, and 002 became primary.
And then, the app will try to reconnect the failed nodes. And it caused the redis cluster visit to failed. My assumption was that even if one node was failed, it should automatically refreshed the topology and connect the new master node. But it didn't.
Here is the logs:
2018-11-03 17:46:21.992 [main] INFO org.springframework.jmx.export.annotation.AnnotationMBeanExporter - Located managed bean 'environmentManager': registering with JMX server as MBean [org.springframework.cloud.context.environment:name=environmentManager,type=EnvironmentManager]
2018-11-03 17:46:22.015 [main] INFO org.springframework.jmx.export.annotation.AnnotationMBeanExporter - Located MBean 'dataSourceLog': registering with JMX server as MBean [com.alibaba.druid.spring.boot.autoconfigure:name=dataSourceLog,type=DruidDataSourceWrapper]
2018-11-03 17:46:22.022 [main] INFO org.springframework.jmx.export.annotation.AnnotationMBeanExporter - Located managed bean 'refreshScope': registering with JMX server as MBean [org.springframework.cloud.context.scope.refresh:name=refreshScope,type=RefreshScope]
2018-11-03 17:46:22.055 [main] INFO org.springframework.jmx.export.annotation.AnnotationMBeanExporter - Located managed bean 'configurationPropertiesRebinder': registering with JMX server as MBean [org.springframework.cloud.context.properties:name=configurationPropertiesRebinder,context=70a9f84e,type=ConfigurationPropertiesRebinder]
2018-11-03 17:46:22.075 [main] INFO org.springframework.jmx.export.annotation.AnnotationMBeanExporter - Located MBean 'dataSourceCore': registering with JMX server as MBean [com.alibaba.druid.spring.boot.autoconfigure:name=dataSourceCore,type=DruidDataSourceWrapper]
2018-11-03 17:46:22.078 [main] INFO org.springframework.jmx.export.annotation.AnnotationMBeanExporter - Located MBean 'statFilter': registering with JMX server as MBean [com.alibaba.druid.filter.stat:name=statFilter,type=StatFilter]
2018-11-03 17:46:22.101 [main] INFO org.springframework.context.support.DefaultLifecycleProcessor - Starting beans in phase 0
2018-11-03 17:46:22.136 [main] INFO org.springframework.cloud.netflix.eureka.InstanceInfoFactory - Setting initial instance status as: STARTING
2018-11-03 17:46:22.203 [main] INFO com.netflix.discovery.DiscoveryClient - Initializing Eureka in region us-east-1
2018-11-03 17:46:22.317 [main] INFO com.netflix.discovery.provider.DiscoveryJerseyProvider - Using JSON encoding codec LegacyJacksonJson
2018-11-03 17:46:22.317 [main] INFO com.netflix.discovery.provider.DiscoveryJerseyProvider - Using JSON decoding codec LegacyJacksonJson
2018-11-03 17:46:22.557 [main] INFO com.netflix.discovery.provider.DiscoveryJerseyProvider - Using XML encoding codec XStreamXml
2018-11-03 17:46:22.558 [main] INFO com.netflix.discovery.provider.DiscoveryJerseyProvider - Using XML decoding codec XStreamXml
2018-11-03 17:46:23.144 [main] INFO com.netflix.discovery.shared.resolver.aws.ConfigClusterResolver - Resolving eureka endpoints via configuration
2018-11-03 17:46:23.188 [main] INFO com.netflix.discovery.DiscoveryClient - Disable delta property : false
2018-11-03 17:46:23.189 [main] INFO com.netflix.discovery.DiscoveryClient - Single vip registry refresh property : null
2018-11-03 17:46:23.189 [main] INFO com.netflix.discovery.DiscoveryClient - Force full registry fetch : false
2018-11-03 17:46:23.189 [main] INFO com.netflix.discovery.DiscoveryClient - Application is null : false
2018-11-03 17:46:23.189 [main] INFO com.netflix.discovery.DiscoveryClient - Registered Applications size is zero : true
2018-11-03 17:46:23.189 [main] INFO com.netflix.discovery.DiscoveryClient - Application version is -1: true
2018-11-03 17:46:23.189 [main] INFO com.netflix.discovery.DiscoveryClient - Getting all instance registry info from the eureka server
2018-11-03 17:46:23.578 [main] INFO com.netflix.discovery.DiscoveryClient - The response status is 200
2018-11-03 17:46:23.587 [main] INFO com.netflix.discovery.DiscoveryClient - Starting heartbeat executor: renew interval is: 10
2018-11-03 17:46:23.596 [main] INFO com.netflix.discovery.InstanceInfoReplicator - InstanceInfoReplicator onDemand update allowed rate per min is 4
2018-11-03 17:46:23.602 [main] INFO com.netflix.discovery.DiscoveryClient - Discovery Client initialized at timestamp 1541238383601 with initial instances count: 7
2018-11-03 17:46:23.622 [main] INFO org.springframework.cloud.netflix.eureka.serviceregistry.EurekaServiceRegistry - Registering application AUTH-SERVER with eureka with status UP
2018-11-03 17:46:23.624 [main] INFO com.netflix.discovery.DiscoveryClient - Saw local status change event StatusChangeEvent [timestamp=1541238383623, current=UP, previous=STARTING]
2018-11-03 17:46:23.634 [DiscoveryClient-InstanceInfoReplicator-0] INFO com.netflix.discovery.DiscoveryClient - DiscoveryClient_AUTH-SERVER/auth-server:172.18.0.153:8000 : registering service...
2018-11-03 17:46:23.636 [main] INFO org.springframework.context.support.DefaultLifecycleProcessor - Starting beans in phase 2147483647
2018-11-03 17:46:23.637 [main] INFO springfox.documentation.spring.web.plugins.DocumentationPluginsBootstrapper - Context refreshed
2018-11-03 17:46:23.706 [main] INFO springfox.documentation.spring.web.plugins.DocumentationPluginsBootstrapper - Found 1 custom documentation plugin(s)
2018-11-03 17:46:23.720 [DiscoveryClient-InstanceInfoReplicator-0] INFO com.netflix.discovery.DiscoveryClient - DiscoveryClient_AUTH-SERVER/auth-server:172.18.0.153:8000 - registration status: 204
2018-11-03 17:46:23.908 [DiscoveryClient-InstanceInfoReplicator-0] INFO com.alibaba.druid.pool.DruidDataSource - {dataSource-1} inited
2018-11-03 17:46:23.940 [main] INFO springfox.documentation.spring.web.scanners.ApiListingReferenceScanner - Scanning for api listing references
2018-11-03 17:46:24.391 [main] INFO springfox.documentation.spring.web.readers.operation.CachingOperationNameGenerator - Generating unique operation named: rolesUsingGET_1
2018-11-03 17:46:24.525 [main] INFO springfox.documentation.spring.web.readers.operation.CachingOperationNameGenerator - Generating unique operation named: getUserTokenInfoUsingPOST_1
2018-11-03 17:46:24.559 [main] INFO springfox.documentation.spring.web.readers.operation.CachingOperationNameGenerator - Generating unique operation named: deleteUsingDELETE_1
2018-11-03 17:46:24.578 [main] INFO springfox.documentation.spring.web.readers.operation.CachingOperationNameGenerator - Generating unique operation named: saveOrUpdateUsingPOST_1
2018-11-03 17:46:24.893 [DiscoveryClient-InstanceInfoReplicator-0] INFO com.alibaba.druid.pool.DruidDataSource - {dataSource-2} inited
2018-11-03 17:46:24.947 [main] INFO org.springframework.scheduling.annotation.ScheduledAnnotationBeanPostProcessor - No TaskScheduler/ScheduledExecutorService bean found for scheduled processing
2018-11-03 17:46:24.969 [main] INFO org.apache.coyote.http11.Http11NioProtocol - Starting ProtocolHandler ["http-nio-8000"]
2018-11-03 17:46:24.971 [main] INFO org.apache.tomcat.util.net.NioSelectorPool - Using a shared selector for servlet write/read
2018-11-03 17:46:25.027 [main] INFO org.springframework.boot.web.embedded.tomcat.TomcatWebServer - Tomcat started on port(s): 8000 (http) with context path ''
2018-11-03 17:46:25.029 [main] INFO org.springframework.cloud.netflix.eureka.serviceregistry.EurekaAutoServiceRegistration - Updating port to 8000
2018-11-03 17:46:25.034 [main] INFO com.central.OpenAuthServerApp - Started OpenAuthServerApp in 31.95 seconds (JVM running for 33.267)
2018-11-03 17:48:47.239 [lettuce-eventExecutorLoop-1-1] INFO io.lettuce.core.protocol.ConnectionWatchdog - Reconnecting, last destination was /172.18.0.156:7022
2018-11-03 17:48:47.239 [lettuce-eventExecutorLoop-1-2] INFO io.lettuce.core.protocol.ConnectionWatchdog - Reconnecting, last destination was /172.18.0.156:7022
2018-11-03 17:48:56.236 [lettuce-eventExecutorLoop-1-2] INFO io.lettuce.core.protocol.ConnectionWatchdog - Reconnecting, last destination was 172.18.0.156:7022
2018-11-03 17:48:56.236 [lettuce-eventExecutorLoop-1-1] INFO io.lettuce.core.protocol.ConnectionWatchdog - Reconnecting, last destination was 172.18.0.156:7022
2018-11-03 17:49:04.436 [lettuce-eventExecutorLoop-1-3] INFO io.lettuce.core.protocol.ConnectionWatchdog - Reconnecting, last destination was 172.18.0.156:7022
2018-11-03 17:49:04.436 [lettuce-eventExecutorLoop-1-2] INFO io.lettuce.core.protocol.ConnectionWatchdog - Reconnecting, last destination was 172.18.0.156:7022
2018-11-03 17:49:20.835 [lettuce-eventExecutorLoop-1-1] INFO io.lettuce.core.protocol.ConnectionWatchdog - Reconnecting, last destination was 172.18.0.156:7022
2018-11-03 17:49:20.835 [lettuce-eventExecutorLoop-1-3] INFO io.lettuce.core.protocol.ConnectionWatchdog - Reconnecting, last destination was 172.18.0.156:7022
2018-11-03 17:49:50.935 [lettuce-eventExecutorLoop-1-2] INFO io.lettuce.core.protocol.ConnectionWatchdog - Reconnecting, last destination was 172.18.0.156:7022
2018-11-03 17:49:50.935 [lettuce-eventExecutorLoop-1-1] INFO io.lettuce.core.protocol.ConnectionWatchdog - Reconnecting, last destination was 172.18.0.156:7022
2018-11-03 17:50:21.035 [lettuce-eventExecutorLoop-1-3] INFO io.lettuce.core.protocol.ConnectionWatchdog - Reconnecting, last destination was 172.18.0.156:7022
2018-11-03 17:50:21.037 [lettuce-eventExecutorLoop-1-2] INFO io.lettuce.core.protocol.ConnectionWatchdog - Reconnecting, last destination was 172.18.0.156:7022
2018-11-03 17:50:51.135 [lettuce-eventExecutorLoop-1-1] INFO io.lettuce.core.protocol.ConnectionWatchdog - Reconnecting, last destination was 172.18.0.156:7022
2018-11-03 17:50:51.135 [lettuce-eventExecutorLoop-1-3] INFO io.lettuce.core.protocol.ConnectionWatchdog - Reconnecting, last destination was 172.18.0.156:7022
2018-11-03 17:51:21.235 [lettuce-eventExecutorLoop-1-1] INFO io.lettuce.core.protocol.ConnectionWatchdog - Reconnecting, last destination was 172.18.0.156:7022
2018-11-03 17:51:21.236 [lettuce-eventExecutorLoop-1-2] INFO io.lettuce.core.protocol.ConnectionWatchdog - Reconnecting, last destination was 172.18.0.156:7022
2018-11-03 17:51:23.191 [AsyncResolver-bootstrap-executor-0] INFO com.netflix.discovery.shared.resolver.aws.ConfigClusterResolver - Resolving eureka endpoints via configuration
2018-11-03 17:51:51.335 [lettuce-eventExecutorLoop-1-3] INFO io.lettuce.core.protocol.ConnectionWatchdog - Reconnecting, last destination was 172.18.0.156:7022
2018-11-03 17:51:51.335 [lettuce-eventExecutorLoop-1-2] INFO io.lettuce.core.protocol.ConnectionWatchdog - Reconnecting, last destination was 172.18.0.156:7022
2018-11-03 17:52:21.435 [lettuce-eventExecutorLoop-1-1] INFO io.lettuce.core.protocol.ConnectionWatchdog - Reconnecting, last destination was 172.18.0.156:7022
2018-11-03 17:52:21.435 [lettuce-eventExecutorLoop-1-3] INFO io.lettuce.core.protocol.ConnectionWatchdog - Reconnecting, last destination was 172.18.0.156:7022
2018-11-03 17:52:51.535 [lettuce-eventExecutorLoop-1-1] INFO io.lettuce.core.protocol.ConnectionWatchdog - Reconnecting, last destination was 172.18.0.156:7022
2018-11-03 17:52:51.535 [lettuce-eventExecutorLoop-1-2] INFO io.lettuce.core.protocol.ConnectionWatchdog - Reconnecting, last destination was 172.18.0.156:7022
As you can see, there are not any error logs here. Just some lettuce ConnectionWatchdog reconnecting info, when one primary node failed. I know the reconnect behaviou is normal. But why it could affect access to the redis cluster?
Has anyone met this problem before? Did I miss any important things?