3

Hi I'm developing rails project with sunspot solr and configuring Solr Cloud. My environment: rails 3.2.1, ruby 2.1.2, sunspot 2.1.0, Solr 4.1.6.

Why SolrCloud: I need more stable system - oftentimes search server goes on maintenance and web application stop working on production. So, I think about how to make 2 identical search servers instead of one, to make system more stable: if one server will be down, other will continue working.

I cannot find any good turtorial with simple, easy to understand and described in details turtorial... I'm trying to set up SolrCloud on two servers, but I do not fully understand how it is working inside:

  • synchronize data between two servers (is it automatic action?)
  • balances search requests between two servers
  • when one server suddenly stop working other should become a master (is it automatic action?)
  • is there SolrCloud features other than listed?
bmalets
  • 3,207
  • 7
  • 35
  • 64

1 Answers1

4

Read more about SolrCloud here..! https://wiki.apache.org/solr/SolrCloud

Couple of inputs from my experience.

If your application just reads data from SOLR and does not write to SOLR(in real time but you index using an ETL or so) then you can just go for Master Slave hierarchy.

Define one Master :- Point all writes to here. If this master is down you will no longer be able to index the data

Create 2(or more) Slaves :- This is an feature from SOLR and it will take care of synchronizing data from the master based on the interval we specify(Say every 20 seconds)

Create a load balancer based out of slaves and point your application to read data from load balancer.

Pros: With above setup, you don't have high availability for Master(Data writes) but you will have high availability for data until the last slave goes down.

Cons: Assume one slave went down and you bought it back after an hour, this slave will be behind the other slaves by one hour. So its manual task to check for data consistency among other slaves before adding back to ELB.

How about SolrCloud?

  1. No Master here, so you can achieve high availability for Writes too
  2. No need to worry about data inconsistency as I described above, SolrCloud architecture will take care of that.

What Suits Best for you.

  1. Define a external Zookeeper with 3 nodes Quorom
  2. Define at least 2 SOLR severs.
  3. Split your Current index to 2 shards (by default each shard will reside one each in 2 solr nodes defined in step #2
  4. Define replica as 2 (This will create replica for shards in each nodes)
  5. Define an LB to point to above solr nodes.
  6. Point your Solr input as well as application to point to this LB.

By above setup, you can sustain fail over for either nodes.

Let me know if you need more info on this.

Regards,

Aneesh N

-Let us learn together.

Aneesh Mon N
  • 696
  • 3
  • 9
  • what is the role of Zookeeper and tomcat in SolrCloud architecture? cause as I know Solr actually is a http server and works as http server... – bmalets Aug 02 '15 at 19:17
  • And is it possible to make SolrCloud only with 2 identical search servers with the same synchronized indexes? I meen that I don't need sharding, I need system with two identical clones of search server – bmalets Aug 02 '15 at 19:20
  • 1
    zookeeper is the one who stores the configuration files compared to solr in SolrCloud. Also, it knows the state of each nodes and status if the same as who is the leader, who is in Recovering state etc. ZooKeeper elects one replica out of n replica when the leader goes down. – Aneesh Mon N Aug 02 '15 at 19:20
  • so Zookeepr if a "manager of nodes". thanks for so fast answers :) – bmalets Aug 02 '15 at 19:21
  • 1
    Standalone solr is a single shard. Yes you can setup cloud to have single shard with many replica so as to have high availability – Aneesh Mon N Aug 02 '15 at 19:22
  • And last question - why also tomcat is used with solr? and what means "Quorom"? – bmalets Aug 02 '15 at 19:24
  • 1
    Yes, ZooKeeper is the one who controls the Cluster. Without ZooKeeper you cannot start the solr in cloud mode and if you start it will use inbuilt zookeeper. Cluster can sustain reads even if the zookeeper is down after the starting the Cluster. But cannot take writes without zookeeper – Aneesh Mon N Aug 02 '15 at 19:25
  • 1
    Jetty is by default with SOLR. Tomcat is an alternative. They both are for standard web servises. Quorom is the term used for clustering ZooKeeper. Since the Solr Cluser is depended on ZooKeeper, it became the single point failure. So we have to Cluser ZooKeeper too. Ideal number of ZooKeeper's for production is 5. – Aneesh Mon N Aug 02 '15 at 19:28
  • does SolrCloud needs zookeepers on external node? as I understand, I need to search servers with solr and one server with configured zookeepers, yes? – bmalets Aug 03 '15 at 12:10
  • 1
    You can keep ZooKeepers in the Same node as solr node, but that does not achieve high availability as both solr and ZooKeeper will go down when the Node goes down. but if you have a More than one ZooKeeper you can keep each in different nodes among the SOLR nodes. According to cloud it allows us to cluster ZooKeeper and all the SOLR instance even in one single node by changing the ports of SOLR instance ZooKeeper instances. – Aneesh Mon N Aug 03 '15 at 12:15
  • Hi Aneesh, i have 2 solr nodes, can i have 3 ZK instances running on node1 and 2ZK on node 2, to maintain quorom, the only problem will be when the node 2 of containing solr goes down, it will take down 2ZK instances and making write disabled in solr cluster..is my understanding correct? – huzefam Aug 12 '17 at 22:44