2

I'm currently having some issues with the full text search functionality in MongoDB. Specifically when trying to match exact phrases.

I'm testing out the functionality in the mongo shell, but ultimately I'll be using Spring Data MongoDB with Java.

So I first tried running this command to search for the words "delay", "late" and the phrase "on time"

db.mycollection.find( { $text: { $search: "delay late \"on time\"" } }).explain(true);

And the resulting explain query told me:

"parsedTextQuery" : {
    "terms" : [
            "delay",
            "late",
            "time"
    ],
    "negatedTerms" : [ ],
    "phrases" : [
            "on time"
    ],
    "negatedPhrases" : [ ] },

The issues here being that I don't want to search for the word "time", but rather the phrase "on time". I do want to search for delay and late and ideally don't want to prevent the stemming.

I tried a few different permutations e.g.

db.mycollection.find( { $text: { $search: "delay late \"'on time'\"" } }).explain(true);

db.mycollection.find( { $text: { $search: "delay late \"on\" \"time\"" } }).explain(true);

But couldn't seem to get the right results. I can't see anything obvious in the documentation about this.

For my purposes should I use the full text search for individual words and the regex search functionality for phrases?

Currently working with MongoDB version 2.6.5. Thanks.

robarthur1
  • 457
  • 2
  • 9
  • 19

1 Answers1

3

Did you try the text search to see if it didn't behave correctly? It works as expected for me on MongoDB 2.6.7:

> db.test.drop()
> db.test.insert({ "t" : "I'm on time, not late or delayed" })
> db.test.insert({ "t" : "I'm either late or delayed" })
> db.test.insert({ "t" : "Time flies like a banana" })
> db.test.ensureIndex({ "t" : "text" })

> db.test.find({ "$text" : { "$search" : "time late delay" } }, { "_id" : 0 })
{ "t" : "I'm on time, not late or delayed" }
{ "t" : "Time flies like a banana" }
{ "t" : "I'm either late or delayed" }

> db.test.find({ "$text" : { "$search" : "late delay" } }, { "_id" : 0 })
{ "t" : "I'm on time, not late or delayed" }
{ "t" : "I'm either late or delayed" }

> db.test.find({ "$text" : { "$search" : "late delay \"on time\"" } }, { "_id" : 0 })
{ "t" : "I'm on time, not late or delayed" }

Why is "time" in the terms array in the explain? Because if the phrase "on time" occurs in a document, the term time must also. MongoDB uses the text index to the extent it can to help locate the phrase and then will check the index results to see which actually matches the full phrase and not just the terms in the phrase.

wdberkeley
  • 11,531
  • 1
  • 28
  • 23
  • Hi thanks for your response. I get the same results as your example, I guess my confusion is still around the third example query you provided. In the third example, I would expect that query to match the third and second record you instersted, as the second one matches "late" and "delay". I guess I can't understand if it seems to ignore the other terms when looking for a phrase? – robarthur1 Feb 09 '15 at 08:35
  • After a bit of searching, I get the impression that when searching for a phrases it uses a logical AND rather than the OR it uses with individual words. Do you know of a way to search for multiple words and phrases using a logical OR without running multiple queries and filtering on duplicates? Thanks. – robarthur1 Feb 09 '15 at 09:09
  • Can you give me an example of what you want? I think that'll be easiest for me to understand. It should be a new question at this point, I think. – wdberkeley Feb 09 '15 at 15:45
  • Hi @wdberkeley, thanks for your response. I've posted a follow up question here http://stackoverflow.com/questions/28428288/mongodb-logical-or-when-searching-for-words-and-phrases-using-full-text-search. – robarthur1 Feb 10 '15 at 09:36