2

I'm trying to create a search engine that uses Lucene syntax boolean queries to do searches against different thirdparties api. It all seems to work fine, the user enters the query in a textfield and I parse it using https://www.npmjs.com/package/lucene-query-parser to see that it's acctually a valid Lucene query. However, now I have gotten a request to have a query with multiple OR:s in a proximity search. So basically they want to search on A, B or C, any of those should be close to Z, X OR Y. The normal Lucene proximity search looks like this:

"a b"~20 

Which gives me all hits where a is 20 words or closer to b. I have been looking all over the place for the syntax of doing this multiple OR proximity search with Lucene for a couple of days now but haven't found anything. Is it even possible?

I have tried stuff like this: "a OR b, x OR z"~20 "a b OR x z"~20 "a, b OR x, z"~20

But none of them work. Thanks in advance!

Daniel Gustafsson
  • 1,725
  • 7
  • 37
  • 78
  • 1
    Someone may prove me wrong, of course, but I do not believe the classic query parser is flexible enough to handle this, except by providing every required combination separately: `"a x"~20 OR "a y"~20 OR "a z"~20 OR "b x"~20 OR ...` - and even worse (if I recall correctly), the order of the terms is important: `"a x"~20` is not the same as `"x a"~20` (so, does x _follow_ a in the document, or is it the other way round?). – andrewJames Jun 15 '21 at 16:13
  • 1
    As an alternative, [span queries](https://lucene.apache.org/core/8_7_0/core/org/apache/lucene/search/spans/package-summary.html#package.description) are a lot more powerful - but require using the API to build them. – andrewJames Jun 15 '21 at 16:13
  • @andrewjames you offer such speedy support of Lucene questions that I'd give you some of my SO points if I could. Impressed. (tip of the hat) – RonC Jun 15 '21 at 17:36
  • 1
    @ronc - thank you for your kind words. I wish I had been able to provide a better "here's how to do it" answer, instead of "I don't think you can". – andrewJames Jun 16 '21 at 02:29
  • 1
    Thanks guys! I guess I need to change approach on this one :( – Daniel Gustafsson Jun 16 '21 at 06:54
  • @andrewjames Often an "I don't think you can" answer is very helpful. You have a lot of experience with Lucene so if you don't think it can be done that way then it probably can't. That hunch helps the person not waste too much time searching for something that doesn't exist. "I don't think you can." is still the experience of one developer helping another developer. – RonC Jun 16 '21 at 11:24
  • Minor correction: To implement this functionality using the API, I would use [MultiPhraseQuery](https://lucene.apache.org/core/8_8_2/core/index.html), not span queries. (But I still cannot find any way to do this using the classic query parser.) – andrewJames Jun 18 '21 at 18:12

0 Answers0