0

I am trying to make a simple query using sunspot/solr and I seem to fail to fetch the results when my query string contains the word "of".

To be more specific:

When I query "University of Thessaloniki" solr returns no hits, but when I query "University Thessaloniki" it does.

Here are the logs:

Sep 29, 2012 10:24:56 AM org.apache.solr.core.SolrCore execute
INFO: [] webapp=/solr path=/select params={fl=*+score&start=0&q=University+of+Thessaloniki&qf=status_code_text+pi_details_text+other_party_name_text+contact_details_text+other_pi_details_text+sending_or_receiving_text+start_at_str_text+materials_text&wt=ruby&fq=type:Mta&defType=dismax&rows=10000} hits=0 status=0 QTime=8 

Sep 29, 2012 10:25:09 AM org.apache.solr.core.SolrCore execute
INFO: [] webapp=/solr path=/select params={fl=*+score&start=0&q=University+Thessaloniki&qf=status_code_text+pi_details_text+other_party_name_text+contact_details_text+other_pi_details_text+sending_or_receiving_text+start_at_str_text+materials_text&wt=ruby&fq=type:Mta&defType=dismax&rows=10000} hits=9 status=0 QTime=5 

When I make the same query directly on the admin interface of sunspot/solr (http://localhost:8981/solr/admin/analysis.jsp?highlight=on) it highlights matches.

Can you please help me, find my mistake?

Thanks in advance Panayotis

p.matsinopoulos
  • 7,655
  • 6
  • 44
  • 92

1 Answers1

1

You're using the dismax query parser, which allows you to configure the minimum number should match. The default value is 100%, meaning that all the clauses must match. Apparently your documents don't contain the word of. If that's the case you just need to configure the minimum should match parameter, taking into account that its behaviour slightly changes depending on the solr version you're using. On the other hand, if you think you have the word of I'd suggest you to check how you are indexing your documents. Is it possible that you're applying a stopword filter at index time but not at query time?

javanna
  • 59,145
  • 14
  • 144
  • 125
  • I believe that the answer is somewhere around what you are saying. My documents contain the phrase "University of Thessaloniki" and each of these 3 words separately too. So....let us see your second suggestion about the stopword filter. Yes, it seems that I am applying a "stopword" filter (which include word "of"), but I do not apply that at query time. What I apply at index time are: StandardFilterFactory, LowerCaseFilterFactory, StopFilterFactory with ignoreCase true, NGramFilterFactory witn min 3 and max 30. On the other hand, query time, I only aply the first 2. Any clue? – p.matsinopoulos Sep 29 '12 at 19:41
  • You do need to apply the stopword filter at query time. In fact you're not indexing the term `of` since it's a stopword, but you are querying for it, that's why you don't get any result back. If you apply the stopword filter to the query the `of` term will removed from the querytoo. – javanna Sep 29 '12 at 19:51
  • So whats the final solution ( config ) look like then? – Trip Jun 05 '13 at 14:33