0

I can't figure out how to make phrase query to work. It returns exact mathes, but slop option doesn't seem to make a difference.
Here's my code:

static void Main(string[] args)
    { 
     using (Directory directory = new RAMDirectory())
        {
            Analyzer analyzer = new StandardAnalyzer(Lucene.Net.Util.Version.LUCENE_29);

            using (IndexWriter writer = new IndexWriter(directory, analyzer, true, IndexWriter.MaxFieldLength.UNLIMITED))
            {
                // index a few documents
                writer.AddDocument(createDocument("1", "henry morgan"));
                writer.AddDocument(createDocument("2", "henry junior morgan"));
                writer.AddDocument(createDocument("3", "henry immortal jr morgan"));
                writer.AddDocument(createDocument("4", "morgan henry"));
            }

            // search for documents that have "foo bar" in them
            String sentence = "henry morgan";
            IndexSearcher searcher = new IndexSearcher(directory, true);
            PhraseQuery query = new PhraseQuery()
            {
                //allow inverse order
                Slop = 3
            };

            query.Add(new Term("contents", sentence));

            // display search results
            List<string> results = new List<string>();
            Console.WriteLine("Looking for \"{0}\"...", sentence);
            TopDocs topDocs = searcher.Search(query, 100);
            foreach (ScoreDoc scoreDoc in topDocs.ScoreDocs)
            {
                var matchedContents = searcher.Doc(scoreDoc.Doc).Get("contents");
                results.Add(matchedContents);
                Console.WriteLine("Found: {0}", matchedContents);
            }
        }

private static Document createDocument(string id, string content)
    {
        Document doc = new Document();
        doc.Add(new Field("id", id, Field.Store.YES, Field.Index.NOT_ANALYZED));
        doc.Add(new Field("contents", content, Field.Store.YES, Field.Index.ANALYZED, Field.TermVector.WITH_POSITIONS_OFFSETS));
        return doc;
    }

I thought that all options except document with id=3 are supposed to match, but only the first one does. Did I miss something?

xlecoustillier
  • 16,183
  • 14
  • 60
  • 85
chester89
  • 8,328
  • 17
  • 68
  • 113

1 Answers1

2

In Lucene In Action 2nd, 3.4.6 Searching by phrase: PhraseQuery.

PhraseQuery uses this information to locate documents where terms are within a certain distance of one another

Sure, a plain TermQuery would do the trick to locate this document knowing either of those words, but in this case we only want documents that have phrases where the words are either exactly side by side (quick fox) or have one word in between (quick [irrelevant] fox)

So the PhraseQuery is actually used between terms and the sample code in that chapter also proves it. As you use StandardAnalyzer, so "henry morgan" will be henry and morgan after analyzation. Therefore, you can not add "henry morgan" as one Term

/*
   Sets the number of other words permitted between words 
   in query phrase.If zero, then this is an exact phrase search.  
*/
 public void setSlop(int s) { slop = s; }

The definition of setSlop may further explain the case. After a little change on your code, I got it nailed.

// code in Scala
val query = new PhraseQuery();
query.setSlop(3)
List("henry", "morgan").foreach { word =>
    query.add(new Term("contents", word))
}

In this case, the four documents will all be matched. If you have any further problem, I suggest you read that chapter in Lucene In Action 2nd. That might help.

Allen Chou
  • 1,229
  • 1
  • 9
  • 12