3

I'm a SolrCloud newbie, my setup is 3 shards, 3 replicas, external Zookeeper

Today I found shard3 down, replica3 had taken over as leader, so indexing was occurring to replica3 not shard3. I stopped Tomcat/SOLR in reverse order (R3,R2,R1,S3,S2,S1) and restarted in forward order (S1,S2,S3,R1,R2,R3). I did not delete any tlog or replication.properties files. The cloud graph shows all hosts with their correct assignments. As I understand it these assignments are set in Zookeeper on the first startup.

My question is how does the data that was indexed to replica3 get back to the revived shard3?

And surprisingly shard3 = 87G while replica3 = 80G.

Confused!

dan coleman
  • 99
  • 1
  • 6

2 Answers2

2

Dan,

The size of replicas are not important, only the number of documents that collection has.

The way Solr works, you can have deleted documents in your collection that only are deleted in merge operations, this extra 7G can be deleted documents.

Yago Riveiro
  • 727
  • 13
  • 28
0

1) As far as I know when the shard3 is up, live and running it is zookeeper which does the data sync job between shard and replica3.

2) Regarding your second question, may be the replica3 is in optimization state and hence you are seeing less data size and shard3 is yet to be optimized by SOLR. (This is just a wild guess)

Jayesh Bhoyar
  • 727
  • 5
  • 16