2

I'm attempting to perform a find query in mongo (4.2) which returns matches for a single specified field that can handle substring match, case insensitive match and diacritic match.

(The query will ultimately be performed using js/express, but I don't believe that is relevant - examples using shell)

I have found various solutions for finding subsets of this criteria, but nothing for all of them.

Assuming a collection:

db.stuff = [
 { name: "Björn Smith", city: "Örebro"},
 { name: "John Franzén", city: "Norrköping"}
 { name: "Henry Jones", city: "Smith Town"}
]

I would like to be able to find:

  • the first record using: name = 'bjorn' (substring + insensitive + diacritic)
  • the second query using city = 'norrko' (substring + insensitive + diacritic)
  • and ONLY find the first record when searching name = 'smith' ($text would return 2 results)

Method 1: collation

Works for diacritic

db.stuff.find({name: 'Bjorn Smith'}).collation({ locale: 'en', strength: 1 })

Doesn't Work for substrings

db.stuff.find({name: 'Bjorn'}).collation({ locale: 'en', strength: 1 })

Method 2: regexp with collation

Works for substrings

db.stuff.find({name: {$regex: 'Björn'}}).collation({ locale: 'en', strength: 1 })

Doesn't Work for diacritic

db.stuff.find({name: {$regex: 'Bjorn'}}).collation({ locale: 'en', strength: 1 })

Method 3: indexOfCP

(credit: https://stackoverflow.com/a/56808870)

Works for substrings

db.stuff.find({ $expr: { $gt: [{ $indexOfCP: [ { $toLower: '$name' }, 'björn'] }, -1] } }).collation({ locale: 'en', strength: 1 })

Doesn't work for diacritic

db.stuff.find({ $expr: { $gt: [{ $indexOfCP: [ { $toLower: '$name' }, 'bjorn'] }, -1] } }).collation({ locale: 'en', strength: 1 })

Method 4: $text

This works for substring and diacritic, but I want to be able to specify which field I'm searching for, and this leverages index to search across all fields

Works for diacritic

db.stuff.createIndex( { name: "text", city: "text" } )
db.stuff.find({$text: {$search: 'Bjorn'}})

Doesn't work for substring within a word

db.stuff.find({$text: {$search: 'Bjo'}})

Returns too many results across multiple fields

db.stuff.find({$text: {$search: 'Smith'}})
cafin8d
  • 21
  • 1

0 Answers0