5

I have a document that has the following Schema

{
  description : String,
  tags : [String]
}

I have indexed both fields as text, but the problem is that whenever I search for a specific string within the array, it will return the document only if the string is the first element of the array. Therefore it seems that the $text index only works for the first element, is this how mongo inherently works or is there an option that must be passed to the index?

Example document

{
   description : 'random description',
   tags : ["hello", "there"]
}

The object that created the index

{description : 'text', tags : 'text'}

The query

db.myCollection.find({$text : {$search : 'hello'}});

returns a document but

db.myCollection.find({$text : {$search : 'there'}});

does not return anything.

using version 2.6.11

I have other indexes but these are the only text indexes. Here is the corresponding output of db.myCollection.getIndexes()

{
                "v" : 1,
                "key" : {
                        "_fts" : "text",
                        "_ftsx" : 1
                },
                "name" : "description_text_tags_text",
                "ns" : "myDB.myCollection",
                "weights" : {
                        "description" : 1,
                        "tags" : 1
                },
                "default_language" : "english",
                "language_override" : "language",
                "textIndexVersion" : 2
        },
naughty boy
  • 2,089
  • 3
  • 18
  • 28

1 Answers1

3

This has nothing to do with the string being first element or second element of the array. The word "there" is in the stop-words list of "english" language and is not added to the index at all. The text indexing process involves stemming and removal of the stop words from the text, before the terms gets added to the text index and these processes are language dependent.

You may like to create the text index as:

db.myCollection.ensureIndex({description : 'text', tags : 'text'}, { default_language: "none" }) 

If "none" is used as the default language, then text indexing process will do simple tokenization and will not use any stop words list. By default, "english" is used as the "default_language" for the text index.

Nipun Talukdar
  • 4,975
  • 6
  • 30
  • 42
  • Note that you must drop the index before recreating it. You can also specify the language for a query as `$language` property of `$text` – Explosion Pills Dec 28 '15 at 03:13
  • Great catch, I was using those terms as placeholders, and I doubt the content within the final app will ever do so, but I sure learned something.. Thanks – naughty boy Dec 28 '15 at 04:35