We are trying to add a new Solr node to our cluster:
DC Cassandra
- Cassandra node 1
DC Solr
- Solr node 1 <-- new node (actually, a replacement for an old node; we followed the steps for "replacing a dead node")
- Solr node 2
- Solr node 3
- Solr node 4
- Solr node 5
Our Cassandra data is approximately 962gb. Replication factor is 1 for both DCs. Is it normal for the new node to be in "Active - Joining" state for several days? Is there a way to know the progress?
Last week, there was a time when we had to kill and restart the DSE process because it began throwing "too many open files" exception. Right now, the system log is full of messages about completed compaction/flushing tasks (no errors so far).
EDIT:
The node is still in "Active - Joining" state as of this moment. It's been exactly a week since we restarted the DSE process in that node. I started monitoring the size of the solr.data directory yesterday and so far I haven't seen an increase. The system.log is still filled with compacting/flushing messages.
One thing that bothers me is that in OpsCenter Nodes screen (ring/list view), the node is shown under the "Cassandra" DC even though the node is a Solr node. In nodetool status, nodetool ring, and dsetool ring, the node is listed under the correct DC.
EDIT:
We decided to restart the bootstrap process from scratch by deleting the data and commitlog directories. Unfortunately, during the subsequent bootstrap attempt:
- The stream from node 3 to node 1 (the new node) failed with an exception: ERROR [STREAM-OUT-/] 2014-04-01 01:14:40,887 CassandraDaemon.java (line 196) Exception in thread Thread[STREAM-OUT-/,5,main]
- The stream from node 4 to node 1 never started. The last relevant line in node 4's system.log is: StreamResultFuture.java (line 116) Received streaming plan for Bootstrap. It should have been followed by: Prepare completed. Receiving 0 files(0 bytes), sending x files(y bytes)
How can I force those streams to be retried?