I would like to set up an ultra-fast SolrCloud system, ideally with guaranteed low response times. The issue is that Solr typically has around 1-5% of slow responses e.g. due to leader election, frequent words with more merging, etc.
Question: Has anybody every implemented such a solution or can point me to similar solutions or to what might be issues/caveats to consider?
I’ve been analyzing the SolrJ client and think that an approach similar to that of the LBHttpSolrClient could work – with these modifications:
The client would send queries out to all relevant SolrCloud nodes in parallel (multi-threading) and use the first answer that arrives. These could be generated with a web service framework like Apache CXF.
Control over the document ids, control/tracking of their distribution into shards/replicas and monitoring through ZooKeeper / cluster status (e.g. as returned from queries). Then – based on cluster setup configuration and current status (including ZooKeeper queries) – the SolrJ client could send queries to exact those nodes which should be alive and relevant for a given query.
Notifying SolrJ: It would be great if SolrJ could be notified of cluster changes or services (ZooKeeper / Solr / Ranger etc.) that are temporarily not available to not lose time with them.
Adding monitoring/alerting: Ideally, the SolrJ client would take the timings for all answers and report these for each node and for Zookeeper to a monitoring component (Ambari, Atlas, Log, monitoring/alerting database, send e-mail, etc.)
Any suggestions?