Summary: For Lucene > 6.x, set disableCoord to true, otherwise leave it at false.
Coord is a scoring feature of BooleanQuery to counteract some of TF/IDFs shortcomings for over-saturated terms. It's only relevant for multiple should clauses. In your first scenario, all sub-queries must match, there is no coord factor involved and the disableCoord
parameter has no effect.
In the second scenario, when having multiple should clauses, a BooleanQuery
sums up all the sub-scores to determine, which of the documents is a better match. The idea is that a doc that matches more sub-queries is a better match and thus, gets a better score.
Now, imagine a query x OR y
and a document that has 1000 occurrences of x
but none of y
. With TF/IDF, due to the high termFreq(x)
, the sub-score of x
is very high and so is the resulting score of x OR y
, which can push this document before others, that match both fields, which is not what BooleanQuery
was meant to do. This is where the coord comes into play.
The coord factor is calculated per document as number of should clauses matched/total number of should clauses in query
. This basically gives a number in [0..1]
that represents, how many sub-queries have matches a document. The summed score of all sub-queries is then multiplied by this coord factor. A document matching all should clauses will have it original score of all summed sub-queries and a document matching only x
out of x OR y
will have it's score halved, counteracting the high score that the over-saturated x
gave. If you disabled coord, this factor will not be calculated and the final score is only the sum of the sub-scores.
Coord was designed with TF/IDF in mind and other similarity formulas might not suffer from over-saturated terms. BM25, which has become the default similarity in Lucene 6.0, has much better control over such over-satured terms, controlled by its k1 parameter. Instead of a score that grows near-linear with increasing termFreq, BM25 approaches a limit and stops growing. It gives no boost for documents that have a termFreq=1000 over one that has termFreq=5, but does so for termFreq=1 over termFreq=0. Britta Weber has given a talk at buzzwords about this, where she explains the saturation curve.
That means, for BM25, the coord factor is not necessary anymore and might actually lead to counter-intuitive results. It is already removed from Lucene master and will be gone in 7.0.
If you're using Lucene 6.x witht he default similarity BM25, it's a good idea to always disable the coord, as BM25 does not suffer from the problem coord worked around. If you're using TF/IDF (regardless of 6.x or not), disabling coord will only give you more predictable results as long as your term frequencies are evenly distributed (which they practically never are) and setting disableCoord
to false (the default) will give results, that are intuitively better.