I have an existing Solr setup, running on a standalone Solr instance. I have been asked to add resilience and high availability to this setup. So I would like to add replication to my setup, for which I believe SolrCloud is the way to go?
I have run through the demo's on the SolrCloud wiki. However I am not sure, how to add my existing Solr instance to ZooKeeper and then add some more nodes for it to replicate to. Is this possible without re-bulking?
The wiki states
NOTE: When you are not using an example to start solr, make sure you upload the configuration set to zookeeper before creating the collection.
However I am unsure which files it is referring to and how to do this?
Cuurent setup info:
- Solr 4.5.1
- 2vCPU's 24GB RAM
- 66 million docs in index
- 58Gb index size
- Bulk index time ~50 hours
- 4000 max users
- 400 average concurrent users
- 20k updates per day
- User searching via solrJ application
- Querying involves grouping
Wish list
- Existing Solr Index replicated to 2 new nodes
- 3 Zookeeper nodes to provide resilience
What I have tried:
- Download Zookeeper, run zkServer start with default settings -OK
- Start existing solr setup with option -DzkHost=actualhostname:2181
But I recieve an error from solr "Could not load SOLR configuration".
So I guess my question summarises to:
- For my setup is SolrCloud the way to go rather than say ReplicationHandler?
- Is it possible to add solrCloud and ZK support without re-indexing (50hrs is a long time)?
- Which config files am I supposed to be adding to zk and how?
- Am I correct that without additional config changes sharding is not an option because I am using grouping in my queries?
- Should I upgrade from solr 4.5.1 if so how far?
- Most importantly, does my "Wish list" look like a good idea/bad idea/moon on a stick? If good, how to achieve it? If bad, an suggestions?
I am pretty new to Solr (~12 months use) and very new to Zookeeper and SolrCloud (~2 weeks reading/experimenting), so any advice on achieving the above would be very much appreciated.