I'm attempting to perform a find query in mongo (4.2) which returns matches for a single specified field that can handle substring match, case insensitive match and diacritic match.
(The query will ultimately be performed using js/express, but I don't believe that is relevant - examples using shell)
I have found various solutions for finding subsets of this criteria, but nothing for all of them.
Assuming a collection:
db.stuff = [
{ name: "Björn Smith", city: "Örebro"},
{ name: "John Franzén", city: "Norrköping"}
{ name: "Henry Jones", city: "Smith Town"}
]
I would like to be able to find:
- the first record using: name = 'bjorn' (substring + insensitive + diacritic)
- the second query using city = 'norrko' (substring + insensitive + diacritic)
- and ONLY find the first record when searching name = 'smith' ($text would return 2 results)
Method 1: collation
Works for diacritic
db.stuff.find({name: 'Bjorn Smith'}).collation({ locale: 'en', strength: 1 })
Doesn't Work for substrings
db.stuff.find({name: 'Bjorn'}).collation({ locale: 'en', strength: 1 })
Method 2: regexp with collation
Works for substrings
db.stuff.find({name: {$regex: 'Björn'}}).collation({ locale: 'en', strength: 1 })
Doesn't Work for diacritic
db.stuff.find({name: {$regex: 'Bjorn'}}).collation({ locale: 'en', strength: 1 })
Method 3: indexOfCP
(credit: https://stackoverflow.com/a/56808870)
Works for substrings
db.stuff.find({ $expr: { $gt: [{ $indexOfCP: [ { $toLower: '$name' }, 'björn'] }, -1] } }).collation({ locale: 'en', strength: 1 })
Doesn't work for diacritic
db.stuff.find({ $expr: { $gt: [{ $indexOfCP: [ { $toLower: '$name' }, 'bjorn'] }, -1] } }).collation({ locale: 'en', strength: 1 })
Method 4: $text
This works for substring and diacritic, but I want to be able to specify which field I'm searching for, and this leverages index to search across all fields
Works for diacritic
db.stuff.createIndex( { name: "text", city: "text" } )
db.stuff.find({$text: {$search: 'Bjorn'}})
Doesn't work for substring within a word
db.stuff.find({$text: {$search: 'Bjo'}})
Returns too many results across multiple fields
db.stuff.find({$text: {$search: 'Smith'}})