1

I'm curious how MongoDB stores a Text-Index inside the metadata. I assume it's somehow an array of stopword-filtered and stemmed words, but I can't find any official hints.

To be more clear:

I want to know how they are stored inside the metadata. E.g. how a Text-Index would look like for a document

{"content": "Dogs are man's best friend"}

where the field content has a Text-Index. My guess is, that it should be something like this:

{"words": ["dog", "man", "best", "friend"]} 

but I found no official statements

Dgame
  • 26
  • 3
  • Possible duplicate of [MongoDB - Full Text Index - Full Text Search - stemming](https://stackoverflow.com/questions/22750643/mongodb-full-text-index-full-text-search-stemming) – Nilesh Singh Feb 11 '18 at 13:16
  • No, that is not the same. I've edited my post to be more specific. – Dgame Feb 11 '18 at 13:40
  • 1
    Implementation details and key format depend on the version of text index, whether the text index includes a prefix or suffix, and the length of terms being indexed. There have been several distinct text index versions which support different features (e.g. the v3 format introduced in MongoDB 3.2 which improves case, diacritic, and delimiter support). For specific details the official reference would be the MongoDB source code on GitHub. Suggested starting point: [db/fts/fts_index_format.cpp](https://github.com/mongodb/mongo/blob/master/src/mongo/db/fts/fts_index_format.cpp). – Stennie Feb 12 '18 at 05:28
  • I'm not sure if that fully answers your curiosity, but the technical details get into the weeds. The high level concept is essentially how what you've described: a vector of stemmed terms with stopwords removed. – Stennie Feb 12 '18 at 05:28

0 Answers0