0

We are trying to add a new Solr node to our cluster:

DC Cassandra

  • Cassandra node 1

DC Solr

  • Solr node 1 <-- new node (actually, a replacement for an old node; we followed the steps for "replacing a dead node")
  • Solr node 2
  • Solr node 3
  • Solr node 4
  • Solr node 5

Our Cassandra data is approximately 962gb. Replication factor is 1 for both DCs. Is it normal for the new node to be in "Active - Joining" state for several days? Is there a way to know the progress?

Last week, there was a time when we had to kill and restart the DSE process because it began throwing "too many open files" exception. Right now, the system log is full of messages about completed compaction/flushing tasks (no errors so far).

EDIT:

The node is still in "Active - Joining" state as of this moment. It's been exactly a week since we restarted the DSE process in that node. I started monitoring the size of the solr.data directory yesterday and so far I haven't seen an increase. The system.log is still filled with compacting/flushing messages.

One thing that bothers me is that in OpsCenter Nodes screen (ring/list view), the node is shown under the "Cassandra" DC even though the node is a Solr node. In nodetool status, nodetool ring, and dsetool ring, the node is listed under the correct DC.

EDIT:

We decided to restart the bootstrap process from scratch by deleting the data and commitlog directories. Unfortunately, during the subsequent bootstrap attempt:

  1. The stream from node 3 to node 1 (the new node) failed with an exception: ERROR [STREAM-OUT-/] 2014-04-01 01:14:40,887 CassandraDaemon.java (line 196) Exception in thread Thread[STREAM-OUT-/,5,main]
  2. The stream from node 4 to node 1 never started. The last relevant line in node 4's system.log is: StreamResultFuture.java (line 116) Received streaming plan for Bootstrap. It should have been followed by: Prepare completed. Receiving 0 files(0 bytes), sending x files(y bytes)

How can I force those streams to be retried?

PJ.
  • 1,196
  • 2
  • 12
  • 25
  • What version of DSE are you using? – Sven Delmas Mar 24 '14 at 14:36
  • DSE 4.0.0; although we have two nodes (including the new one) which were inadvertently upgraded to 4.0.1 due to incomplete yum auto-upgrade exclusions – PJ. Mar 24 '14 at 15:58
  • It would be helpful if you could take a few thread dumps on the joining node and post those here/somewhere, to understand what is actually going on (as looks like the node is making progress). – sbtourist Mar 24 '14 at 17:51
  • Thread dump: https://www.dropbox.com/s/yea3ea96jvnky6p/multidump.tgz – PJ. Mar 25 '14 at 00:09
  • @PJ. what does 'dsetool ring' say about the state of the node? I want to figure out if this is an issue with the node itself (ie, DSE), or if OpsCenter is simply reporting the state incorrectly. If it is OpsCenter, restarting the datastax-agent on that node should resolve the issue. – mbulman Apr 01 '14 at 14:35
  • 'dsetool ring' says that the node is still joining. I think we suffered some connectivity issues that lead to stream failures. Unfortunately, the data state seems messed up badly that even a bootstrap retry does not seem to help (file sizes not changing anymore). We ended up deleting the whole data directory and restarting the bootstrap process from scratch. As of this moment, the new bootstrap attempt is in progress - at least I can see that the file sizes are changing, and the node's system log is showing some successful index builds – PJ. Apr 01 '14 at 15:44
  • Any idea about how to view/estimate the bootstrap progress? (i.e. something like percentage done and ETA) It would also be good if OPSC would show the netstats/tpstats/stream stats of a bootstrapping node in the ring view. Currently, only active nodes show that information. Bootstrapping nodes simply show "no active tasks", "no active streams", etc even though the cli counterpart (nodetool) can display data – PJ. Apr 01 '14 at 15:49
  • You can view the bootstrap progress via 'nodetool netstats' and 'nodetool compactionstats' (if there are no streams). Reference: https://wiki.apache.org/cassandra/Operations#Bootstrap . There is not a way to view an ETA afaik. I've created a ticket for ensuring the OpsCenter displays streaming and compaction progress for joining nodes (OPSC-2543) – mbulman Apr 03 '14 at 13:01

0 Answers0