Cassandra nodejs driver time out after a node moves

Question

We use vnodes on our cluster.

I noticed that when the token space of a node changes (automatically on vnodes, during a repair or a cleanup after adding new nodes), the datastax nodejs driver gets a lot of "Operation timed out - received only X responses" for a few minutes.

I tried using ONE and LOCAL_QUORUM consistencies.

I suppose this is due to the coordinator not hitting the right node just after the move. This seems to be a logical behavior (data was moved) but we really want to address this particular issue.

What do you guys suggest we should do to avoid this ? Having a custom retry policy ? Caching ? Changing the consistency ?

Example of behavior

when we see this:

4/7/2016, 10:43am   Info    Host 172.31.34.155 moved from '8185241953623605265' to '-1108852503760494577'

We see a spike of those:

{
  "message":"Operation timed out - received only 0 responses.",
  "info":"Represents an error message from the server",
  "code":4608,
  "consistencies":1,
  "received":0,
  "blockFor":1,
  "isDataPresent":0,
  "coordinator":"172.31.34.155:9042",
  "query":"SELECT foo FROM foo_bar LIMIT 10"
}

score 1 · Accepted Answer · answered Apr 07 '16 at 15:14

1

I suppose this is due to the coordinator not hitting the right node just after the move. This seems to be a logical behavior (data was moved) but we really want to address this particular issue.

In fact, when adding new node, there will be token range movement but Cassandra can still serve read requests using the old token ranges until the scale out has finished completely. So the behavior you're facing is very suspicious.

If you can reproduce this error, please activate query tracing to narrow down the issue.

The error can also be related to a node under heavy load and not replying fast enough

answered Apr 07 '16 at 15:14

doanduyhai

8,712
27
26

I checked the logs in more details and apparently this only happened with this particular host. Funny thing is it gets moved from one range to another and back to the previous all the time. I will try to investigate this. Thank you for your answer, as you say it seems unrelated. – Vincent de Lagabbe Apr 08 '16 at 14:28
Answered another question for the related issue http://stackoverflow.com/questions/36593636/one-cassandra-vnode-keeps-moving-back-and-forth-all-the-time – Vincent de Lagabbe Apr 13 '16 at 09:09

Cassandra nodejs driver time out after a node moves

Example of behavior

1 Answers1