Lucene multiphrasequery search with wildcard

Question

I have been trying to do a lucene search query where entering "Foo B" would return "Foo Bar", Foo Bear, Foo Build" etc. but will not return a record with an ID of "Foo" and the word "Bar" in say its 'description' field.

I have looked into multiphrasequery but it never returns any results, below is what I have been trying

        Term firstTerm = new Term("jobTitle", "Entry");
        Term secondTerm = new Term("jobTitle", "Artist");
        Term asdTerm = new Term(fld)

        Term[] tTerms = new Term[]{firstTerm, secondTerm};
        MultiPhraseQuery multiPhrasequery = new MultiPhraseQuery();
            multiPhrasequery.add( tTerms );

             org.hibernate.Query hibQuery = fullTextSession.createFullTextQuery(multiPhrasequery, this.type).setSort(sort);
         results = hibQuery.list();

score 0 · Accepted Answer · answered Aug 06 '13 at 15:29

0

The likely problem that I see is capitalization. "Entry" and "Artist" are not getting passed through a query parser, and so will not be run through an analyzer, and so are case sensitive. The field you are indexing is probably analyzed with an analyzer that includes a LowercaseFilter, so the end terms would not contain leading capitals. Without knowing how you index your documents, I can't say that will fix it with any certainty, but it seems the most likely possibility.

That fixed, the query you've created should match anything with either the term "entry" or "artist" in the jobTitle field.

answered Aug 06 '13 at 15:29

femtoRgon

32,893
7
60
87

Thanks, that did answer my question but it also showed me that this is still not the route I need to go lol as I need to get the results of "Search Phrase" only in one field instead of "search" from x field and "phrase" from why, or a entry only having one of terms, thanks though! – Adam James Aug 06 '13 at 15:44
I think the problem is that you misunderstand how `MultiPhraseQuery` works. `add({term1, term2});` adds both at the same position in the phrase, as alternatives. `add(term1); add(term2);` adds them at consecutive positions, which is probably what you are looking for. I'd recommend reading the [documentation](http://lucene.apache.org/core/4_0_0/core/org/apache/lucene/search/MultiPhraseQuery.html) of this class carefully. – femtoRgon Aug 06 '13 at 15:49
Yea I'm just kind of looking for the actual way that people go about doing phrase searches with wildcards and such and have turned up so many different (and many times outdated) results its dizzying . I tried adding them individually like add(term1); add(term2); but it doesn't return anything. – Adam James Aug 06 '13 at 16:41
Just found this from lucene "To use this class, to search for the phrase "Microsoft app*" first use add(Term) on the term "Microsoft", then find all terms that have "app" as prefix using IndexReader.terms(Term), and use MultiPhraseQuery.add(Term[] terms) to add them to the query." I feel like this is very inefficient, doing an initial query then querying every combination of those results – Adam James Aug 06 '13 at 16:50
That's exactly how Lucene goes about wildcard queries, by enumerating the Terms meeting that criteria and generating "primitive queries" like `BooleanQuery` and `TermQuery` which are then run against the index. You don't want to query for it though, no, you want to get `TermEnum` from `AtomicReader.terms("field").iterator(TermsEnum);`, and then use the `seekCeil` method to seek right to the Terms you need to enumerate for inclusion into your `MultiPhraseQuery`. – femtoRgon Aug 06 '13 at 17:04
You can also construct this sort of thing with a combination of `SpanQuery` types as well (`SpanTermQuery` for first term, `SpanMultiTermQueryWrapper` wrapping a `PrefixQuery` for the wildcard term, and combined with `SpanNearQuery` with slop = 0 and inorder = true). Lucene will still perform the step of enumerating terms, as it would with any prefix query. – femtoRgon Aug 06 '13 at 17:08
huh, is AtomicReader only in the newer versions of lucene? Apparently we are using 3.5? I also was trying to follow the example solution here http://stackoverflow.com/questions/5075304/how-to-use-a-multiphrasequery, I assumed that his 'reader' object was an IndexReader which has no method or property 'terms' – Adam James Aug 06 '13 at 17:12
Don't ask me what version of Lucene you are using, I don't know. Anyway, see: [PrefixTermEnum](http://lucene.apache.org/core/3_5_0/api/core/org/apache/lucene/search/PrefixTermEnum.html) – femtoRgon Aug 06 '13 at 17:27
I think it might be time just to all it and go grab a beer... `File index = new File("/var/lucene/indexes"); Directory indexDirectory = FSDirectory.open(index); @SuppressWarnings("resource") PrefixTermEnum reader = new PrefixTermEnum(IndexReader.open, secondTerm); TermEnum te = reader; List termList = new LinkedList(); while (te.next()) { Term t = te.term(); if (!t.field().equals("field") || !t.text().startsWith(secondTerm.text())) { break; } termList.add(t); }` – Adam James Aug 06 '13 at 19:15

Lucene multiphrasequery search with wildcard

1 Answers1