Read more about SolrCloud here..! https://wiki.apache.org/solr/SolrCloud
Couple of inputs from my experience.
If your application just reads data from SOLR and does not write to SOLR(in real time but you index using an ETL or so) then you can just go for Master Slave hierarchy.
Define one Master :- Point all writes to here. If this master is down you will no longer be able to index the data
Create 2(or more) Slaves :- This is an feature from SOLR and it will take care of synchronizing data from the master based on the interval we specify(Say every 20 seconds)
Create a load balancer based out of slaves and point your application to read data from load balancer.
Pros:
With above setup, you don't have high availability for Master(Data writes) but you will have high availability for data until the last slave goes down.
Cons:
Assume one slave went down and you bought it back after an hour, this slave will be behind the other slaves by one hour. So its manual task to check for data consistency among other slaves before adding back to ELB.
How about SolrCloud?
- No Master here, so you can achieve high availability for Writes too
- No need to worry about data inconsistency as I described above, SolrCloud architecture will take care of that.
What Suits Best for you.
- Define a external Zookeeper with 3 nodes Quorom
- Define at least 2 SOLR severs.
- Split your Current index to 2 shards (by default each shard will reside one each in 2 solr nodes defined in step #2
- Define replica as 2 (This will create replica for shards in each nodes)
- Define an LB to point to above solr nodes.
- Point your Solr input as well as application to point to this LB.
By above setup, you can sustain fail over for either nodes.
Let me know if you need more info on this.
Regards,
Aneesh N
-Let us learn together.