Using solrj and LBHttpSolrClient to access a single solrcloud instance

Question

Is using the LBHttpSolrClient within solrj to access a single solrcloud instance is it less robust than using the default solrj and zookeeper behavior? Can it load balance over a single solrcloud instance correctly?

The solrcloud instance that I have available has a collection with about 9 million documents, spread over three shards with about 3 million documents per shard. There are three nodes (servers) in the solrcloud, with 3 shards, replicationFactor is 2, and maxShardsPerNode of 2. For this solrcloud instance, there are 3 zookeeper nodes also running on these three servers.

Note: The values listed in the following variable named solrUrls should be prefixed with "http://" instead of "http_url_". I am unable to post more than 2 URLs at this time so I must "encode" them. Sorry.

This is the basic code that I've been told to use:

String zkUrls = "solrd1:2181,solrd2:2181,solrd3:2181";
String solrUrls = {"http_url_solrd1:8983", "http_url_solrd2:8983", "http_url_solrd3:8983"};

LBHttpSolrClient.Builder lbclient = 
    new BHttpSolrClient.Builder().withBaseSolrUrls(solrUrls);
CloudSolrClient solr = new CloudSolrClient.Builder()
    .withLBHttpSolrClientBuilder(lbclient)
    .withZkHost(zkUrls)
    .build();
cloudServer.setDefaultCollection(defaultCollection);

Is this LBHttpSolrClient client able to properly use the provided solrUrls since each node listed in that variable are just nodes within a single solrcloud? Does this load balance client automatically query all the other nodes to ensure the results are complete for the whole collection instead of just the shards that exist on that node?

If the use of the LBHttpSolrClient client is the correct way to access a single solrcloud instance (better than solrj and zookeeper), then is there a better way to let zookeeper provide the base solr urls? I have an impression that the LBHttpSolrClient client predates the whole solrcloud setup and was a way to load balance over multiple standalone instances of solr; if that's the case then would the use of the LBHttpSolrClient client be obsolete compared to solrj and zookeeper?

References:

Is there any loss of functionality if I use load balancer which does not communicate with zookeeper in solrcloud?
- This link appears to have an appropriate title that may provide some insight in to the same questions that I'm asking, but it has no answers.
Loadbalancer and Solrcloud
- This link discusses how solrj and zookeeper works together, but does not address my questions on if the LBHttpSolrClient client is less robust or if it will work correctly on a single instance of a small solrcloud.
SolrCloud load-balancing
- Does not address if solrj and zookeeper is better suited than use of the LBHttpSolrClient client.

score 2 · Answer 1 · answered May 17 '17 at 07:19

2

I think you are overcomplicating things, you can even totally skip the LBHttpSolrClient in your code, and Solrj will create the needed instance behind the scenes.

In short, CloudSolrClient uses LBHttpSolrClient to send request to right Solr instances. If you want to get the most out of your Solrcloud setup, use CloudSolrClient, if you use just a LBHttpSolrClient (without CloudSolrClient), then you will not know a Solr node has gone down for instance (until you get failed requests).

answered May 17 '17 at 07:19

Persimmonium

15,593
11
47
78

So are you saying that the LBHttpSolrClient is more primitive than just using solrj and zookeeper? That is my impression since it appears as if zookeeper is ignored altogether when going through LBHttpSolrClient but I cannot find documentation to support or disprove that impression. So even though in my example I have `.withZkHost(zkUrls)` I totally suspect that does nothing since you have to provide the solr URLs prior to that. Thanks. – r Blue May 18 '17 at 00:00
1

if by 'using solrj and zookeeper' you mean using the code above without the .withLBHttpSolrClientBuilder() line, then yes – Persimmonium May 18 '17 at 11:25
So to be clear, by using the `.withLBHttpSolrClientBuilder()` it basically shuts down any benefit you will get from zookeeper? In other words, it bypasses any and all zookeeper functionality. – r Blue May 25 '17 at 20:04
Would there be _any_ reason at all to use the `.withLBHttpSolrClientBuilder()` directive? Background: I'm being forced to use it to pass ACL credentials to solr, which makes no sense, but I cannot find a better alternative for the type of ACLs they are using or reasons why to keep clear of the LBHttpSolrClient. But that's probably better left to another post. – r Blue May 25 '17 at 20:08
your current reason might be valid. I have no looked into in in detail, but if you dont provide a LBHttpSolrClient, SolrJ will create one for you, so it is not like you are doing something crazy here. – Persimmonium May 25 '17 at 21:03

Using solrj and LBHttpSolrClient to access a single solrcloud instance

1 Answers1