Why first occurrence of a query takes more time than the following on SolrCloud?

Question

I am "new" to Solr (using version 8.7). I am trying to benchmark the solution, one of my tests is to send the same query several times to the same solr node on the same collection to ensure the relevance of the response times and estimate the potential differences we can observe.

My problem is that sometimes the first time the query is sent the response time is longer than the following times. An example of response times for the same query sent 10 times (in ms) : 31380, 405, 423, 412, 364, 381, 383, 378, 369, 266 (this query uses sort and fq, with {!cache=false}). At first I thought that it could be due to host resolution, but I still have the issue after changing the target host to the IP address of the machine. I have no other process running on the machine, it is dedicated to solr benchmark, so it is not a problem with CPU or RAM in my opinion. I am using SolrCloud with 2 nodes but the collection I am sending requests on only has 1 shard with 1 replica, so I don't think it is due to a communication between nodes...

Recommended things I tried :

disable caches (filterCache, queryResultCache and documentCache, even a custom cache named "perSegFilter" that was there by default..) in solrconfig, first by adding enabled="false" then by commenting out the blocs (then reloading the collection etc... Even restarting the cluster), I keep checking solr caches metrics to see if they are used but they are not ;
use {!cache=false} everywhere I can in the query ;
disable cold searcher.

After searching for a long time on solr documentation and forums, I am running out of ideas.. If anyone even encountered this problem, or if someone has an idea or an explanation of how it works (maybe it is not a problem) that would help me greatly.

Thank you for your time

Peter Cordes · Answer 1 · 2021-08-10T09:49:54.180

Update: the times you're seeing are long enough that CPU-frequency / CPU-cache / JVM "warm up" should be a minor part of the total time for the first query. That should mostly settle down within a second. I don't think this answer explains 31 sec, 0.4 sec, etc. Maybe there is some application-level caching that , unless there's some huge I/O bottleneck and disk-caching is explaining it.

TL:DR: this answer probably doesn't explain most of the slow-first-run effect. It could if we were talking 30 ms then 0.4 ms, but we're not.

For benchmarking in general, it's very common to see "warm up" effects where the first test of something is slower.

You're stopping it from caching the query result so it has to be re-computed, but it's totally normal that the computation will go faster if it was just done with the same data on the same CPU.

First, the CPU will take some time to decide to switch from idle to max turbo clock frequency (especially if it's older than Intel Skylake which introduced lower latency frequency control).

2nd, there are many forms of cache that will speed up doing another computation, e.g. the JVM itself will probably have profiled and JIT-compiled during the first run, so it's ready to just run native code. 2nd, memory is probably already allocated (by the JVM) and already page-faulted, not needing to be allocated from the OS. Branch prediction will have already learned some of the pattern of branching, if there's no delay between queries so the CPU(s) doesn't/don't have time to go into a really deep sleep (which would flush and power down the caches, including branch predictors).

Plus any data that needed to come from disk on the first query should now be hot in RAM, letting the OS handle open/read system calls faster. Although that probably persists across stop/restart of a set of tests, unless you're restarting a whole VM or doing an OS-level drop_caches if you want to benchmark that cold a state. (Probably not realistic unless the normal case is querying a working set that's bigger than your RAM.)

See also Idiomatic way of performance evaluation? for various CPU / OS

(Caveat: I know nothing about Solr, just what I can see from the tag mouseover. I'm here for the benchmarking / performance / cpu-architecture questions).

This is also the reason why Solr has configurable warming queries - it allows you to make Solr re-read the index and run specific queries when being updated so that the file system cache, the JVM caches (JIT, memory allocated, etc.) and other internals caches are warm before answering queries. — MatsLindh, Aug 09 '21 at 13:32
Noted. I expected this answer, as you said restarting the whole VM or clearing the OS cache between requests is not realistic. My aim is to find requests that take around 30s to process every time I send them to observe the differences we can expect as we will not cache the result of all requests. According to your answer this test is not possible, the first response time is not realistic and the following are not relevant as I want to simulate the first request several times.. I will wait to see if someone has an explanation relative to solr, thanks for your answer. — Random_User, Aug 10 '21 at 09:37
@Random_User: hmm, I didn't notice the actual numbers buried in the middle of a big paragraph the first time I read it, and your total times are longer than I was expecting for each query, and the difference is way bigger than can be explained by most of these factors, except maybe slow disk I/O. CPU frequency, and initial JVM warm-up and page faults for new memory should all mostly settle down within under a second for most workloads, so 30 s followed by 0.4 s and similar low times doesn't really explain the effect you're seeing. Unless your data is on a slow disk or NFS mount? — Peter Cordes, Aug 10 '21 at 09:46

Why first occurrence of a query takes more time than the following on SolrCloud?

1 Answers1