4

I have an existing Solr setup, running on a standalone Solr instance. I have been asked to add resilience and high availability to this setup. So I would like to add replication to my setup, for which I believe SolrCloud is the way to go?

I have run through the demo's on the SolrCloud wiki. However I am not sure, how to add my existing Solr instance to ZooKeeper and then add some more nodes for it to replicate to. Is this possible without re-bulking?

The wiki states

NOTE: When you are not using an example to start solr, make sure you upload the configuration set to zookeeper before creating the collection.

However I am unsure which files it is referring to and how to do this?

Cuurent setup info:

  • Solr 4.5.1
  • 2vCPU's 24GB RAM
  • 66 million docs in index
  • 58Gb index size
  • Bulk index time ~50 hours
  • 4000 max users
  • 400 average concurrent users
  • 20k updates per day
  • User searching via solrJ application
  • Querying involves grouping

Wish list

  • Existing Solr Index replicated to 2 new nodes
  • 3 Zookeeper nodes to provide resilience

What I have tried:

  • Download Zookeeper, run zkServer start with default settings -OK
  • Start existing solr setup with option -DzkHost=actualhostname:2181

But I recieve an error from solr "Could not load SOLR configuration".

So I guess my question summarises to:

  1. For my setup is SolrCloud the way to go rather than say ReplicationHandler?
  2. Is it possible to add solrCloud and ZK support without re-indexing (50hrs is a long time)?
  3. Which config files am I supposed to be adding to zk and how?
  4. Am I correct that without additional config changes sharding is not an option because I am using grouping in my queries?
  5. Should I upgrade from solr 4.5.1 if so how far?
  6. Most importantly, does my "Wish list" look like a good idea/bad idea/moon on a stick? If good, how to achieve it? If bad, an suggestions?

I am pretty new to Solr (~12 months use) and very new to Zookeeper and SolrCloud (~2 weeks reading/experimenting), so any advice on achieving the above would be very much appreciated.

Mysterion
  • 9,050
  • 3
  • 30
  • 52

2 Answers2

0

With solrcloud you could split the content on different nodes, if you use multiple shards. You can start with a single shard (one leader and few replicas). Then copy the index and tlog directories from the solr classic that you currently use to the solr cloud leader. This way you do not need to reindex. Later on you can split the shard, if the content is too big for a single node or if you want to spread the index across multiple nodes. Latest solr release is 4.10.3. Why not using that instead of 4.5.1?

Solr documentation explains pretty well how to create the zk content: https://cwiki.apache.org/confluence/display/solr/SolrCloud+Configuration+and+Parameters

Essentially when you start your first solr cloud node, you tell it where zk cluster is, or you have a choice to start zk on the same node as solr. You also need to tell it where the config files are as it will copy them on zk.

Calin Grecu
  • 76
  • 1
  • 6
0
  • For my setup is SolrCloud the way to go rather than say ReplicationHandler?

SolrCloud is the way forward with Solr, so I'd say yes.

  • Is it possible to add solrCloud and ZK support without re-indexing (50hrs is a long time)?

If you don't use sharding, only replicas, no need to reindex.

  • Which config files am I supposed to be adding to zk and how?

Start your fist Solr with -Dbootstrap_conf=true, this will load your config files into ZK.

  • Am I correct that without additional config changes sharding is not an option because I am using grouping in my queries?

Depends on what exactly you do with grouping see https://wiki.apache.org/solr/DistributedSearch for what's supported or not.

  • Should I upgrade from solr 4.5.1 if so how far?

Upgrading to the latest version is a good idea, although past Solr 4.7, you will need Java 7.

  • Most importantly, does my "Wish list" look like a good idea/bad idea/moon on a stick? If good, how to achieve it? If bad, an suggestions?

I vote for good idea, I have a similar one.

Yann
  • 1,019
  • 1
  • 8
  • 18
  • 1
    Thanks @Yann for your clear answers to all my questions. -Dbootstrap was the part I was missing. I now have a simple test environment running with a cut down dataset to prove the point and will soon move this to live. Thanks again – sonicscorpion Feb 04 '15 at 18:58