I read the following documents
- https://solr.apache.org/guide/8_11/shards-and-indexing-data-in-solrcloud.html
- https://solr.apache.org/guide/8_11/distributed-requests.html
- https://solr.apache.org/guide/8_11/getting-started-with-solrcloud.html
- https://solr-user.lucene.apache.narkive.com/b1bL4ZMQ/does-cloudsolrserver-hit-zookeeper-for-every-request
Based on these my understanding is
- We can have SolrCloud with embedded Zookeeper or
- External ensemble of Zookeepers
- But irrespective of having any type of Zookeepers, for querying, indexing etc is hitting Zookeeper better in performance or hitting Solr node better or what is the correct / best practise?
To elaborate on #3 - distributed requests for example, says to hit the solr node (http://localhost:8983/solr/gettingstarted/select?q=:) to query across all shards but I see Solr kafka connect, Solrj etc facilitating to connect via Zookeeper and to Solr node.
I see some places it says ZooKeeper provides failover and load balancing, I also see some places saying ZooKeeper maintains cluster state, ZooKeeper holds the configs of Solr nodes etc
Is there any official documentation that explains what is the role of Zookeeper in Solr/SolrCloud and when/why would someone hit Zookeeper address and not Solr node?
Any leads are much appreciated. Thanks