0

I search for game "Mass Effect 2"

http://localhost:8085/solr/select/?defType=edismax&qf=title&q=Mass+Effect+2&mm=1

Besides of "Mass Effect 2" and "The Showdown Effect" it finds things like "Borderlands 2", "Prototype 2" and other games having "2" in their name. So I want to exclude documents which matched only by "2".

title field is defined as:

<field name="title" type="text" indexed="true" stored="true" multiValued="false" />
Ivan Virabyan
  • 1,666
  • 2
  • 19
  • 25

2 Answers2

1

It's not an exact answer, but an easy one that could be ok for your case. You can use the minimum should match parameter and the edismax query parser, so if there's a number in your query you can increase the number and use 2 for example.

http://wiki.apache.org/solr/DisMaxQParserPlugin#mm_.28Minimum_.27Should.27_Match.29

Taking in account that rather than excluding only-numbers, the use case can be told as "use an argument just for boosting", you can rewrite the query and use a nested one for the score. as writen in: http://searchhub.org/2009/03/31/nested-queries-in-solr your query could be writen

 text:Mass Effect OR query:"{!dismax mm=2 }Mass effect 2"

the idea it's to make the query without the number and include the number as a nested query to boost the numbers

Jokin
  • 4,188
  • 2
  • 31
  • 30
  • Consider this query: "Mass Effect 2". According to your answer I should set mm=2. Then solr wouldn't find "Showdown Effect", because there'no term "2" or "Mass". So basically I need solr to take into account only words, not numbers. Thanks anyway – Ivan Virabyan Jan 28 '13 at 11:50
  • that's why I said that it could be ok, maybe if the number is not needed, why include in the search at all? you could ignore it, and search with only "mass effect" would have the expected results. Sometimes it's easier if you switch the perspective, in which cases do you want to take the number in account? – Jokin Jan 28 '13 at 14:47
  • I have "Mass Effect", "Mass Effect 2", and "Mass Effect 3" in database. And I want to be "Mass Effect 2" in the first place. If I remove number from search query, "Mass Effect" is the best match. I don't need this number alone, but only when it appears with other words from query. – Ivan Virabyan Jan 29 '13 at 08:48
  • well, in that case you could use some nested queries, http://searchhub.org/2009/03/31/nested-queries-in-solr/ something like: text:Mass Effect OR _query_:"{!dismax mm=2 }Mass effect 2" It's not tested, but the idea it's to make the query without the number and include the number as a nested query to boost the number. – Jokin Jan 29 '13 at 11:21
  • Yes, subquery is what I really needed! Thank you. Could you please update your answer? – Ivan Virabyan Jan 29 '13 at 11:55
  • updated, As i said, sometimes you just need a change in perspective. – Jokin Jan 29 '13 at 16:17
0

Would the 'pf' phrase boost instead of quoted phrase search give you a better result? It says with all elements in proximity, not necessarily the same order.

Alexandre Rafalovitch
  • 9,709
  • 1
  • 24
  • 27
  • `pf` gives a boost to those documents, but it doesn't cut irrelevant documents down. By irrelevant I mean documents only containing one digit from the query. I tried to filter by score, but when using 'pf' a have documents 'Mass Destruction' and 'Borderlands 2' having roughly the same score – Ivan Virabyan Jan 24 '13 at 06:06