Redis version: 3.2.10
ioredis version: 3.1.4
Recently I tested running a resharding operation on a redis cluster whilst the cluster was under quite high load, the clients were using the ioredis node.js library, occasionally on the clients we saw the following errors during the reshard operations.
Too many Cluster redirections. Last error: ReplyError: MOVED 8155 172.31.37.232:6379.
Error: Too many Cluster redirections. Last error: ReplyError: MOVED 8155 172.31.37.232:6379
at Cluster.handleError (/data/api/node_modules/ioredis/lib/cluster/index.js:554:30)
at Command.command.reject (/data/api/node_modules/ioredis/lib/cluster/index.js:444:13)
at Redis.exports.returnError (/data/api/node_modules/ioredis/lib/redis/parser.js:75:18)
at JavascriptReplyParser.returnError (/data/api/node_modules/ioredis/lib/redis/parser.js:25:13)
at JavascriptReplyParser.run (/data/api/node_modules/redis-parser/lib/javascript.js:135:18)
at JavascriptReplyParser.execute (/data/api/node_modules/redis-parser/lib/javascript.js:112:10)
at Socket.<anonymous> (/data/api/node_modules/ioredis/lib/redis/event_handler.js:107:22)
at emitOne (events.js:96:13)
at Socket.emit (events.js:188:7)
at readableAddChunk (_stream_readable.js:176:18)
at Socket.Readable.push (_stream_readable.js:134:10)
at TCP.onread (net.js:548:20)
I'm currently using default for the following options for ioredis: maxRedirections 16 retryDelayOnFailover 100ms
Does this mean that during the reshard it took longer than 1600ms for the slot data to transfer across?
I know broadly the process of migrating a slot goes like:
- Set the destination node slot to importing state using CLUSTER SETSLOT IMPORTING .
- Set the source node slot to migrating state using CLUSTER SETSLOT MIGRATING .
- Get keys from the source node with CLUSTER GETKEYSINSLOT command and move them into the destination node using the MIGRATE command.
- Use CLUSTER SETSLOT NODE in the source or destination.
But during which of the above steps will clients be unable to get or set the key data?
For context each node was doing up to 100k ops/second during this test.