-1

Suppose there are several docs having one of the fields clientID, values from ranging 1 to 100.

Query 1:

FQ: **clientID:1 OR clientID:2 OR clientID:3 or clientID:5 or clientID:7 or client ID:8**

Query 2:

FQ: **clientID:[1 TO 3] or clientID:5 or clientID:[7 TO 8]**

Question:

  1. Will there be a big performance difference between these two queries? If yes, how?
  2. Doesn't SOLR do the preprocessing of translating such range values if given in multiple ORs?
jeffry copps
  • 305
  • 5
  • 22

2 Answers2

1

There might be - depending on cached entries, etc. The second query will be two range queries and a regular query combined into three boolean clauses, while the first one will be six different boolean clauses.

Speed probably won't differ too much for your example, but as the number of clauses grow, the latter will keep the number of sets to be intersected lower than the first one. To get exact data - try it out - your core will be different from other people's cores.

And no, Solr won't preprocess anything. That's handed over to Lucene to do as it pleases, but a range query can be resolved in a different way than a exact field query. There can be entries between the terms given in your pure boolean query, so you can't translate it into a range query and expect the same result, and you can't do it the other way around either - since the field may not be integer (and even integer types differ in how they're being indexed).

The important part is usually that the fq will be cached separately, so it's usually more important to keep it re-usable across queries.

MatsLindh
  • 49,529
  • 4
  • 53
  • 84
0

If you use the default numeric types, Solr index more than one precision for each number, (look for trieIntField and IntPointField in Solr field types

so, when when you index a 15, it index it as 15 and as 10, and when you index a 9 it index it as a 9 and as 0. When you search for a 8 - 21 range, it converts the search to a number[8] or number[9] or number[10] or number[20] or number[21] (with binary ranges instead of decimal, but I hope you get the idea). So I suggest you use the range queries and let Solr manage the optimizations.

PointField types are the replacement for TrieFields, functionally are similar but use another data structures to store the information. So if you have a legacy index you can use the triefields, but if you are making new ones the PointFields are recommended.

Jokin
  • 4,188
  • 2
  • 31
  • 30
  • I understand. Could you also answer if the two queries mentioned above will make a difference with respect to time? You can consider the field type to be TrieIntField. – jeffry copps Jan 10 '18 at 06:28
  • With those small ranges, the difference would be negligible. – Jokin Jan 10 '18 at 10:19
  • Consider few 1000 to 2000 OR clauses that could be replaced by 10 to 20 ranges – jeffry copps Jan 10 '18 at 14:29
  • in that case, the ranges would be a much better solution. In fact, solr has a 1024 clause default limit for boolean queries. You can increase the limit, but if it's there is for something. – Jokin Jan 10 '18 at 22:23