Should I hit zookeeper or Solr node for indexing?

Question

I read the following documents

Based on these my understanding is

We can have SolrCloud with embedded Zookeeper or
External ensemble of Zookeepers
But irrespective of having any type of Zookeepers, for querying, indexing etc is hitting Zookeeper better in performance or hitting Solr node better or what is the correct / best practise?

To elaborate on #3 - distributed requests for example, says to hit the solr node (http://localhost:8983/solr/gettingstarted/select?q=:) to query across all shards but I see Solr kafka connect, Solrj etc facilitating to connect via Zookeeper and to Solr node.

I see some places it says ZooKeeper provides failover and load balancing, I also see some places saying ZooKeeper maintains cluster state, ZooKeeper holds the configs of Solr nodes etc

Is there any official documentation that explains what is the role of Zookeeper in Solr/SolrCloud and when/why would someone hit Zookeeper address and not Solr node?

Any leads are much appreciated. Thanks

If you have Zookeeper connectivity in your client you can send your query directly to the nodes responsible for the collection you're asking for, otherwise the request will have to be routed inside your Solr nodes. Both will work - but in the first case you're asking the node that have the relevant data directly, instead of having the query go through an extra hop and taking up time and memory on an additional Solr node. The main reason why you'd want to connect to a Solr node directly in a cloud setting is with legacy clients that do not support retrieving state from Zookeeper. — MatsLindh, Apr 17 '22 at 18:19
Thanks for going thru my post and replying. Is there any official document that explains this? — Hari Rao, Apr 24 '22 at 05:33
https://cwiki.apache.org/confluence/display/solr/ZooKeeperIntegration - this is the main source from when the feature was added in Solr 4. — MatsLindh, Apr 24 '22 at 09:50

Should I hit zookeeper or Solr node for indexing?

0 Answers0