Some documents not appear in atlas-search when query by few letters

Question

I have a collection. The document structure is,

{
  model: {
    name: 'string name'
  }
}

I have enabled atlas search, Also created a search index for model.name field. Search works fine, But the only issue is couldn't get results for very minimal query letters.

Example:

I have a document,

{
  model: {
     name: "space1duplicate"
  }
}

If I query space, I couldn't get the result.

{
  index: 'search_index',
  compound: {
    must: [
      {
        text: {
          query: 'space',
          path: 'model.name'
        }
      }
    ]
  }
}

But If I query space1duplica, It returns the result.

qwerty · Answer 1 · 2022-05-27T18:32:55.097

During indexing, full text search engine tokenizes the input by splitting up text into searchable chunks. Check out the relevant section in the documentation.

By default Atlas Search does not split words by digits, but if you need that, try to define a custom analyzer with the regex tokenizer and use it for your field:

{
  "mappings": {
    "dynamic": false,
    "fields": {
      "name": [
        {
          "analyzer": "digitSplitter",
          "type": "string"
        }
      ]
    }
  },
  "analyzers": [
    {
      "charFilters": [],
      "name": "digitSplitter",
      "tokenFilters": [],
      "tokenizer": {
        "pattern": "[0-9]+",
        "type": "regexSplit"
      }
    }
  ]
}

Also note that you can use multiple analyzers for string fields, if needed.

score 2 · Accepted Answer · edited May 27 '22 at 16:35

2

Atlas search uses Lucene to do the job. Documentation on mongodb site is mostly focused on mongo specific syntax to pass the query to Lucene and might be a bit confusing if you are not familiar with its query language.

First of all, there are number of tokenizers and analizers available, each serve specific purpose. You really need include index definition when you ask quetions about atlas search.

Default tokeniser uses word separators to build the index, then removes endings to store stems, again depending on language, English by default.

So in order to find "space1duplicate" by beginning of the word you can use "autocomplete" analizer with nGram tokens. The index should be created as following:

{
  "mappings": {
    "dynamic": false,
    "fields": {
      "name": {
        "tokenization": "nGram",
        "type": "autocomplete"
      }
    }
  },
  "storedSource": {
    "include": [
      "name"
    ]
  }
}

Once it's indexed (you may need to wait a bit you you have larger dataset), you can find the document with following search:

{
  index: 'search_index',
  compound: {
    must: [
      {
        autocomplete: {
          query: 'spa',
          path: 'name'
        }
      }
    ]
  }
}

edited May 27 '22 at 16:35

silent-box

1,649
3
21
40

answered May 27 '22 at 14:14

Alex Blex

34,704
7
48
75

`@Alex` In your `index` you have added the field `name` in `storedSource`. But Document says **Atlas Search doesn't index stored fields and so you can't query these fields**. But Still, I can query `name` even if I added to `storedSource`. I am confused now. – BadPiggie Aug 03 '22 at 10:55
@BadPiggie I can say! I am confused with the comment itself =). Is the answer wrong, or what are you trying to say? – Alex Blex Aug 03 '22 at 11:10
No. Your answer is perfectly working. But in the [Documentation of Stored Source](https://www.mongodb.com/docs/atlas/atlas-search/stored-source-definition/#std-label-fts-stored-source-definition). They have mentioned this `Atlas Search doesn't index stored fields and so you can't query these fields.`. But in our case, The `name` field is defined in `storedSource` but still can query it. That is why I am confused about `Stored Source`. – BadPiggie Aug 03 '22 at 11:32
@BadPiggie It queries ngrams of the `fields`. `StoredSource` there is to return the string straight from Lucene without fetching the document from the database. It's autocompletion - name is the only string that matters, so you can do `"returnStoredSource": true` in the text search query. – Alex Blex Aug 03 '22 at 12:04

Some documents not appear in atlas-search when query by few letters

Example:

2 Answers2