MongoDB text index diacritic insensitivity issue with Turkish

Asked Dec 21 '17 at 06:41

Active Dec 21 '17 at 06:45

Viewed 365 times

I have a database consisting of documents with Turkish characters ("ı, ö, ğ, ü, ö, ş, ç") with text index in Mongo 3.6. I have created my index to support the Turkish language with,

db.myCollection.createIndex(

    { myField: "text" },

    { default_language: "turkish" }
).

However, when I am trying to perform a text search on myField, the index appears to be distinguishing between some but not all characters that contain diacritical marks and their non-marked counterpart. For instance, it does not distinguish between 'ö' and 'o' but it does between 'ı' and 'i'.

My goal in performing the text search is to lookup names in the database and due to the text search not functioning as I had assumed, the following query:

db.myCollection.find(

    { $text: { $search: "kanik" } }

)

does not return the document, {myField: "Kanık"}.

Any ideas how I might fix this or circumvent this issue?

asked Dec 21 '17 at 06:41

Kaan Dönbekci

It seems that MongoDB 3.6 text indexing does not support diacritics. Check [the documentation](https://docs.mongodb.com/manual/core/index-text/index.html#diacritic-insensitivity). – Andriy Simonov Dec 21 '17 at 11:46

MongoDB text index diacritic insensitivity issue with Turkish

0 Answers0