0

I'm using Examine in Umbraco to query Lucene index of content nodes. I have a field "completeNodeText" that is the concatenation of all the node properties (to keep things simple and not search across multiple fields).

I'm accepting user-submitted search terms. When the search term is multiple words (ie, "firstterm secondterm"), I want the resulting query to be an OR query: Bring me back results where fullNodeText is firstterm OR secondterm.

I want:

{+completeNodeText:"firstterm ? secondterm"}

but instead, I'm getting:

{+completeNodeText:"firstterm secondterm"}

If I search for "firstterm OR secondterm" instead of "firstterm secondterm", then the generated query is correctly: {+completeNodeText:"firstterm ? secondterm"}

I'm using the following API calls:

var searcher = ExamineManager.Instance.SearchProviderCollection["ExternalSearcher"];
var searchCriteria = searcher.CreateSearchCriteria();
var query = searchCriteria.Field("completeNodeText", term).Compile();

Is there an easy way to force Examine to generate this "OR" query? Or do I have to manually construct the raw query by calling the StandardAnalyzer to tokenize the user input and concatenating together a query by iterating through the tokens? And bypassing the entire Examine fluent query API?

1 Answers1

1

I don't think that question mark means what you think it means.

It looks like you are generating a PhraseQuery, but you want two disjoint TermQueries. In Lucene query syntax, a phrase query is enclosed in quotes.

"firstterm secondterm"

A phrase query is looking for precisely that phrase, with the two terms appearing consecutively, and in order. Placing an OR within a phrase query does not perform any sort of boolean logic, but rather treats it as the word "OR". The question mark is a placeholder using in PhraseQuery.toString() to represent a removed stop word (See #Lucene-1396). You are still performing a phrasequery, but now it is expecting a three word phrase firstterm, followed by a removed stop word, followed by secondterm

To simply search for two separate terms, get rid of the quotes.

 firstterm secondterm

Will search for any document with either of those terms (with higher score given to documents with both).

femtoRgon
  • 32,893
  • 7
  • 60
  • 87
  • I didn't realize that meant it was a phrase query. I guess what I want to do is have Examine generate a query that is an OR of each token in the original user-submitted text. – Tom Zahner Sep 10 '13 at 15:59
  • But I want Examine to use the StandardAnalyzer to tokenize everything so that I don't have to manually tokenize the user input myself (by splitting on ' ' and filtering out stop words) and then generate individual "OR" field queries for each of these manually generated tokens. Is it possible to have Examine/Lucene take the token stream and then generate individual OR field queries for each token? – Tom Zahner Sep 10 '13 at 16:08
  • The QueryParser should handle that for you. Did you read the [query syntax documentation](http://lucene.apache.org/core/2_9_4/queryparsersyntax.html) I linked to? Is there some specific problem you are seeing here? – femtoRgon Sep 10 '13 at 17:35
  • I think my problem is that the Examine fluent API `Examine.LuceneEngine.SearchCriteria.LuceneSearchCriteria.Field(string fieldName, string fieldValue)` is generating a phrase query instead of two OR term queries. It uses the StandardAnalyzer to tokenize the input text (which is what I want it to do), but it doesn't generate the exact type of query that I want. I don't see a way to change this behavior without re-implementing the fluent API, so I think I will just have to manually generate the raw query `completeNodeText:(firstterm OR secondterm)` – Tom Zahner Sep 10 '13 at 19:54
  • (cont'd) Which means either string concatenation / manually tokenizing the input string OR using a different strongly-typed query API (other than the Examine fluent API) ? – Tom Zahner Sep 10 '13 at 19:55
  • Ah, I see. Yes, I believe you need to use `RawQuery`. The Field query isn't run through a QueryParser, so it won't generate multiple terms from your string (ie. Lucene query syntax, doesn't come into play for it). One note, if is simplifies things, `completeNodeText:(firstterm secondterm)` is adequate since `OR` is effectively the default operator. – femtoRgon Sep 10 '13 at 20:19