1

I've got an index of about 500.000 documents, and about 10 of these documents contains the title "at the moon" ('title' field) and the tag "nasa" ('tag' field). When I do a search for "at the moon nasa" these documents come up quite far down on the list of the search results. This is because the title field does not get boosted, but the tag field gets boosted quite a bit. So other documents with the tag 'nasa' takes precedence over the documents which almost matches the entire query through the title field.

However, even though Solr can't know, the query "at the moon nasa" almost matches the document title "at the moon". If I remove the "nasa" part from the query, the documents come up at the top.

Is there some way to tell Solr to do some sort of approximate phrase query? Would it make sense to implement some sort of gram-ish search through the bq parameter, where i would split the search phrase up in word combinations such as:

// PHP-ish pseudocode
$bq[]=title:"at the"^2
$bq[]=title:"at the moon"^3
$bq[]=title:"at the moon nasa"^4
$bq[]=title:"the moon"^2
$bq[]=title:"the moon nasa"^3
$bq[]=title:"moon nasa"^4

Would this make sense at all, and would it make sense to boost documents according to how large part of the query they match?

sbrattla
  • 5,274
  • 3
  • 39
  • 63

1 Answers1

3

Before you do anything else, try using eDisMax with pf3 parameter. That does the 3-grams for your automatically.

You may also be interesting in a recent vifun project that helps the visualize the effects of various parameters.

Alexandre Rafalovitch
  • 9,709
  • 1
  • 24
  • 27