How does MongoDB process unsupported languages?

Asked Jun 09 '15 at 05:22

Active Jun 09 '15 at 05:22

Viewed 274 times

I have documents that have a text field. The text could be one of dozens of languages, though most are supported by MongoDB (English, Russian, German, French, etc.). There is also a language field which tells MongoDB the language of the document's text field. How does MongoDB handle unsupported languages, like Urdu or Swahili? A post about MDB 2.4 suggests indexing cannot be performed on unsupported languages. An answer to this question suggests that indexing, but not lemmatization, is performed on unsupported languages. For my case, it is fine if no lemmatization is performed.

edited May 23 '17 at 11:51

Community

asked Jun 09 '15 at 05:22

ZacharyST

you can encode the language and then store it, to have the indexing. – Mohsen Shakiba Jun 09 '15 at 05:51
1

But what if the language is something like Arabic that MongoDB does not stem (because the Snowball software it uses does not)? – ZacharyST Jun 09 '15 at 06:14
1

actually I did a project on arabic a while ago and it stored arabic just fine in the database, no encoding required, and the indexing was working fine too, not sure about the text indexing though. – Mohsen Shakiba Jun 09 '15 at 06:24
Yea, I think it indexes but just doesn't lemmatize. – ZacharyST Jun 09 '15 at 21:03

How does MongoDB process unsupported languages?

0 Answers0