2

I can use only one server to run my application and my Solr server. I was wondering if performance and availability-wise it makes sense to deploy several nodes of SolrCloud and zookeeper on this machine (e.g. using VMs or docker). Since I will be vulnerable to hardware failure, my main concerns are protection against software failure and performance.

Thus, does adding a few nodes (3 maybe?) will help to have a Solr server with higher availability or better performance? Or will it have the opposite effect?

Thematrixme
  • 318
  • 1
  • 4
  • 14

1 Answers1

4

Using multiple JVMs on one piece of hardware isn't generally going to help much.

As you've mentioned, using many JVMs on one machine doesn't reduce your vulnerability to hardware failure, and it adds a bunch of cognitive complexity because now you have to remember that just because you have three replicas, it doesn't mean two can fail unless you're extra careful where you put each of the three.

For most purposes, just using additional shards in a single JVM/Solr instance is simpler, and accomplishes the same performance goal of keeping your index size per core down to manageable levels. This is a central feature of SolrCloud.

The only exception to this I'm aware of is if you're dealing with an index or usage pattern that requires a very large JVM heap. A very large JVM heap can lead to high max GC pause times, and GC tuning can only help so much. In this case, using multiple JVMs, with a single replica/shard per JVM, can constrain the worst-case GC pause to that required for a single replica.

You also mention Zookeeper, so it's worth noting that ZK is a somewhat different beast. You should probably host ZK separately, you should always use an odd number of ZK nodes, and never more than one per physical host.

randomstatistic
  • 800
  • 5
  • 11
  • Thank you for your answer, you confirmed what I was believing. I don't have an index that large I think, so I was more thinking about replicating than sharding on this machine. But out of curiosity, what do you consider to be a very large JVM heap? – Thematrixme Jun 01 '16 at 08:11
  • I'd consider 2-8G to be "normal". But since in this context the thing that matters is the GC pause time, it's really more a matter of how much pause you can tolerate. – randomstatistic Jun 02 '16 at 16:19
  • Then I think we fit in the "normal" case. And I don't think the GC pause time will matter. Anyway, thank you very much! – Thematrixme Jun 03 '16 at 07:44