1

I've come across several occurrences already when one or two of our DSE Search nodes would be shown with "Down - Unresponsive" status in OpsCenter even though the node is up (i.e. I can access the Solr admin UI). Sometimes, nodetool status would also show that the node is down. But more often, it's only OpsCenter. I found out that the fix is to restart the datastax-agent service. Would could be causing this?

I'd also like to follow-up my other questions:

Community
  • 1
  • 1
PJ.
  • 1,196
  • 2
  • 12
  • 25
  • In the cases where OpsCenter is displaying the same state as 'nodetool status', then there is no bug. Otherwise, my hunch is that this is related to a known race condition caused when restarting opscenterd but not the agent process (OPSC-2485). Does it seem to occur only after restarting opscenterd, or more often? Is there more information you can share to help me reproduce? – mbulman Apr 01 '14 at 14:54
  • It seems like OpsCenter got stuck in an outdated state. It might be related to the race condition that you described because a few days earlier, a colleague restarted the OPSC daemon after doing some config changes. I was able to work around the issue by restarting the daemon again (separate machine), followed by the datastax-agent in every DSE node – PJ. Apr 01 '14 at 15:38
  • Let's stick with that hypothesis for the time being then, with the workaround being to restart the agents if opscenterd is restarted. We're planning to get the race condition fixed in the next patch release or two, so keep an eye out for it in the release notes. – mbulman Apr 03 '14 at 12:55

0 Answers0