One hack I could figure out is to use different (unique) boost
factors for each term in the query, and then retrieving boost factors for each matched term from the debug
score so as to deduce which term that score came from.
For example, we can query with foo~2^3.0 bar~2^2.0
(boost scores from bar by 2.0, keep scores from matching against foo untouched). From the debug score output, check the boost factors:
Result 1: food bars: score <total score 1> = food * 3.0 * <other scoring terms> + bars * 2.0 * <other scoring terms>
Result 2: mars bar: score <total score 2> = bar * 2.0 * <other scoring terms>
From which it is clear that food
matched with boost factor of 3.0
, and bars
as well as bar
matched with boost factor of 2.0
. Maintaining a lookup dictionary for which term had what boost to begin with, it is easy to figure out which terms matched.
Two factors to consider:
- If the boost factor is
1.0
, solr debug score does not print it.
- Solr might incorporate some default boost factor for the term based on fuzzy matching, TF-IDF, etc. In this case, the boost factor that shows up will not match against the boosts we supplied in the query. For this reason, we need to execute our query twice - once without any boosting (to understand default boosting for every term), and once with boosting (to see how much it has changed now).
Hope this helps someone.