1

Lets say i have a binary field checked Lets also assume that 3 documents out of 10 has checked:1 others checked:0

When I search in lucene

checked:1 - returns correct result (3)
checked:0 - returns correct result (7)
-checked:1 - returns correct result (7)
-checked:0 - returns correct result (3)

BUT

-(-(checked:1)) - suddenly returns wrong result (10, i.e. entire data set).

Any idea why lucene query parse acts so weird

NeatNerd
  • 2,305
  • 3
  • 26
  • 49
  • You must have at least positive term in your query. Is this what you pasted the whole query, or just a part of it? – mindas Apr 30 '14 at 11:33
  • this the whole query. What do u mean? Could you, please, elaborate in answer maybe? – NeatNerd Apr 30 '14 at 11:35

2 Answers2

2

Each Lucene query has to contain at least one positive term (either MUST/+ or SHOULD) so it matches at least one document. So your queries -checked:1 and -checked:0 are invalid, and I am surprised you are getting any results.

These queries should (most likely) look like this:

  • +*:* -checked:1
  • +*:* -checked:0

Getting back to your problem: double negation makes no sense in Lucene. Why would you have double negation, what are you trying to query?

Generally speaking, don't look at Lucene query operators (! & |) as Boolean operators, they aren't exactly what you think they are.

mindas
  • 26,463
  • 15
  • 97
  • 154
  • why does lucene query have to have at least one positive term? I agree double negation doesnt make sense, but why result is not correct? Isnt -- = +? – NeatNerd Apr 30 '14 at 12:10
  • You need to read the article I linked to understand how matching/scoring works, this will answer your questions. – mindas Apr 30 '14 at 12:11
  • Ok, i will. Seems like Ill have to think about implementing a custom parser, 'cause this doesnt make much sense to me tbh. Thx! – NeatNerd Apr 30 '14 at 12:36
  • As a point of interest, `-checked:1` returns the stated result because the solr query parser has some special handling to parse that into `*:* -checked:1`. The parenthesized terms aren't so handled, so the query becomes `*:* -(-(checked:1))`, and everything - (nothing) = everything. (Personally, I think this special syntax, as well as Lucene's AND, OR and NOT operators may *cause* more confusion that it solves) – femtoRgon Apr 30 '14 at 15:29
  • OP said _When I search in lucene_ (no mention of Solr) hence my doubts. Anyway, thanks for clarification. – mindas Apr 30 '14 at 15:35
1

After some research and trial and error and building up on answer from midas, I have came up with the method to resolve this inconsistency. When I say inconsistency, I mean from a common sense view for a user. From information retrieval prospective, midas has linked an interesting article, which explains why such a query makes no sense. So, the trick is to keep each negative expression with MatchAllDocsQueryNode class, namely the rewritten query has to look like this:

 -(-(checked:1 *:*) *:*)

Then the query will produce the expected result. I have accomplished it by writing my own nodeprocessor class, which performs necessary operations.

NeatNerd
  • 2,305
  • 3
  • 26
  • 49