1

I have made a filter in Lucene.Net to limit the result of the search. I am encountering a very strange issue. The filter is not working with Text Values but working with number values.

For Example:

If I am making a filter with Number values something like below. It is working perfectly.

String field = "id";
Filter LE= new QueryWrapperFilter(new TermQuery( new Term(field, "1234567")));
indexSearcher.Search(QueryMaker(searchString, searchfields), LE, coll);

However, if I give a value containing Text

String field = "id";
Filter LE = new QueryWrapperFilter(new TermQuery(new Term(field, "ZZZOCB9X9Y")));
indexSearcher.Search(QueryMaker(searchString, searchfields), LE, coll);

it is failing. The result is not displaying any records.

Can somebody explain me the issue. Also, I have tested it numerous times to make this claim. I have read on some forums that the Term Query in Lucene versions below 3 will probably have this issue. However, I have changed the version to 3.0.3 but error still persists. I badly need the filter in my program to work. Otherwise I will have to move away from Lucene and find something else.

rae1
  • 6,066
  • 4
  • 27
  • 48
Huzaifa
  • 1,111
  • 5
  • 20
  • 37
  • I have tried all other filters too. Even Boolean Filter. It simply doesn't work with text values. – Huzaifa Jun 04 '13 at 16:31
  • Which `Analyzer` are you using when you index the content? It could be that the one you are using does not generate tokens like you expect (e.g. turning text into lowercase). Also, why are you using QueryMaker? I thought something simpler like `String field = "id"; indexSearcher.search(new TermQuery(new Term(field, "ZZZOCB9X9Y")));` would also work, although I don't use Lucene.net, only Lucene Core (Java). – Kai Chan Jun 04 '13 at 17:35
  • You shouldn't use filters if you can avoid as queries will give better performance, as [shown here](http://stackoverflow.com/a/6469223/1250033). Also, using `TermQuery` will send the value raw against Lucene, needing it to match literally. Can you try `QueryParser` instead? – rae1 Jun 04 '13 at 17:42
  • Also, you should consider adding a [Bounty](http://meta.stackexchange.com/a/16067) if your question doesn't receive enough attention. – rae1 Jun 04 '13 at 17:44
  • Thanks for coming here. I am using Standard Analyzer. Will Standard Analyzer case text values? Also, as you can see my values are a combination of Text and Numeric values. QueryMaker is a method which returns the weight after searching the values. Also, my issue is with the filter not with Searcher. – Huzaifa Jun 04 '13 at 17:46
  • @rae1n sure Let me do that. Also, I have to limit my result to specific area therefore using filter. For example if the user searches for people named John they should only get Johns of their area. Does that answer your question? – Huzaifa Jun 04 '13 at 17:51
  • Again: *You shouldn't use filters if you can avoid it*. They will hurt your performance as they will do linear search instead of logarithmic one. Also, when you use TermQuery, it will only match the value *literally*; it won't match case, or partials. – rae1 Jun 04 '13 at 17:51
  • Also, can you be more specific as to what you mean by: *I have to limit my result to specific area therefore using filter* – rae1 Jun 04 '13 at 17:54
  • I was using Filter because its a built in feature and easy to use. However, looks like I am paying the price. I have number of records in my database from different areas with different Codes(like "ZZZOCB9X9Y"). Lets say a user searches for person named John then the user should get only results from his area Code. For Example there are two Johns from two areas(ZZZOCB9X9Y, ZZZOCB77S) in my DB. The user belongs to ZZZOCB9X9Y area even though he matched two records in the system he should get the records only for ZZZOCB9X9Y(his area code) not for ZZZOCB77S area John. Are you getting me? – Huzaifa Jun 04 '13 at 18:03
  • I understand; however, in some cases the caching of the filters might be inefficient as they can turn quite large, so you might be better off using a `BooleanQuery` to search for `code == "ZZZOCB9X9Y"` and `name == "John"`, and avoid having to deal with issues as case sensitivity... – rae1 Jun 04 '13 at 18:30
  • I think y case is the best of example of what you are saying. Let me try to improvise it and remove filter. Thanks a ton for coming here. – Huzaifa Jun 04 '13 at 18:36

1 Answers1

3

StandardAnalyzer will lowercase all the characters in your TokenStream.

Try this:

Filter LE = new QueryWrapperFilter(new TermQuery(new Term(field, "ZZZOCB9X9Y".ToLowerInvariant())));
Jf Beaulac
  • 5,206
  • 1
  • 25
  • 46
  • Works like magic. Wow!!!! Thank you so so so much. What a relief. What an answer. Thanks a ton again. I was going crazy over it since a long time.What analyzer should I use if I want to store my tokens exactly? Would you recommend that? – Huzaifa Jun 04 '13 at 18:26
  • If you set the Field.Index constructor parameter to Field.Index.NOT_ANALYZED when you create it, it will be indexed as is. There is also the KeywordAnalyzer() which has the same behavior. – Jf Beaulac Jun 04 '13 at 18:28
  • If you use NOT_ANALYZED it will in theory be faster, especially for indexing, but probably not enough to notice it – Jf Beaulac Jun 04 '13 at 18:45