3

I have fallen in a case where I need to do a $text $search in MongoDB by matching exact tokens within a string. I thought I can resolve this by creating a text index without a default language and perform the query by wrapping each token with \"token\", as written in the documentation. So I created my index in this way:

db.collection.createIndex({"denom": "text"}, {"default_language": "none"})

And the query I have to perform is

db.collection.find( {"$text": {"$search": "\"consorzio\" \"la\""}}, {"denom": 1} )

The result I was expecting are all documents that contains exactly the tokens "consorzio" and "la", but instead this query match documents whose tokens contain the string "la" and "consorzio" inside each token

For example, the query above returns the following denom's values in which I expect:

  • CONSORZIO LA* CASCINA OK
  • LA RADA CONSORZIO OK
  • GESCO CONSORZIO AGRICOLA WRONG

Can someone address me in this problem please? I hope the problem is clear.

Thank you very much in advance.

Fernando Aspiazu
  • 953
  • 1
  • 10
  • 28

2 Answers2

0

Mongodb has a reported bug for this issue. Exact maching is not working.

You can take a look at maching score:

db.docs.find({$text: {$search: "\"consorzio\" \"la\""}}, 
             {score: { $meta: "textScore" }, "_id": 0})

{ "t" : "CONSORZIO LA* CASCINA OK", "score" : 1.25 } 
{ "t" : "LA RADA CONSORZIO OK", "score" : 1.25 }
{ "t" : "GESCO CONSORZIO AGRICOLA WRONG", "score" : 0.625 }

A solution should be to take into consideration the highest scores ...

Iulian Stana
  • 1,632
  • 1
  • 14
  • 17
0

Fernando you are actually wrong it matches GESCO CONSORZIO AGRICOLA WRONG but it matches only one word(token) of your search that is consorzio not la.

In a text search textScore will be greater then 1 when it will match all the tokens of the query.

e.g here is a stores collection

db.stores.insert(
   [
     { _id: 1, name: "Java Hut", description: "Coffee and cakes" },
     { _id: 2, name: "Burger Buns", description: "Gourmet hamburgers" },
     { _id: 3, name: "Coffee Java Shop", description: "Just coffee" },
     { _id: 4, name: "Clothes Clothes Clothes", description: "Discount clothing" },
     { _id: 5, name: "Java Shopping", description: "Indonesian goods" },
     { _id: 6, name: "Java Hut", description: "Coffee and cakes" }
   ]
)

Index

db.stores.createIndex( { name: "text" } )

Now if I query

db.stores.find({
    $text: {
        $search: "Java Shop"
    }
}, {
    score: {
        $meta: "textScore"
    }
}).sort({
    score: {
        $meta: "textScore"
    },
    _id: -1
})

It will match the tokens & the result is

/* 1 */
{
    "_id" : 6.0,
    "name" : "Java Shopping",
    "description" : "Indonesian goods",
    "score" : 1.5
}

/* 2 */
{
    "_id" : 5.0,
    "name" : "Java Shopping",
    "description" : "Indonesian goods",
    "score" : 1.5
}

/* 3 */
{
    "_id" : 3.0,
    "name" : "Java Coffee Shop",
    "description" : "Just coffee",
    "score" : 1.33333333333333
}

/* 4 */
{
    "_id" : 1.0,
    "name" : "Java Hut",
    "description" : "Coffee and cakes",
    "score" : 0.75
}

Here you can see the first three documents match all the tokens that's why score is greater then 1 & the last document score is less then 1 because it only matched one token.

Now you can also get the best document that matched all the tokens in this case where score is greater then 1. To do that we need to use MongoDB Aggregation.

db.stores.aggregate([
  { 
      "$match": { 
             "$text": { 
                   "$search": "Java Shop" 
              } 
       } 
  },
  { 
       "$addFields": { 
             "score": { 
                   "$meta": "textScore" 
              } 
        } 
   },
   { 
        "$match": { 
              "score": { "$gt": 1.0 } 
         } 
   },
   { 
        "$sort": { 
              "score": -1, _id: -1 
         }
   },
   { 
        "$limit": 1
   }
])

& here is the result

/* 1 */
{
    "_id" : 6.0,
    "name" : "Java Shopping",
    "description" : "Indonesian goods",
    "score" : 1.5
}
ARIF MAHMUD RANA
  • 5,026
  • 3
  • 31
  • 58