1

I have a collection. The document structure is,

{
  model: {
    name: 'string name'
  }
}

I have enabled atlas search, Also created a search index for model.name field. Search works fine, But the only issue is couldn't get results for very minimal query letters.

Example:

I have a document,

{
  model: {
     name: "space1duplicate"
  }
}

If I query space, I couldn't get the result.

{
  index: 'search_index',
  compound: {
    must: [
      {
        text: {
          query: 'space',
          path: 'model.name'
        }
      }
    ]
  }
}

But If I query space1duplica, It returns the result.

BadPiggie
  • 5,471
  • 1
  • 14
  • 28

2 Answers2

3

During indexing, full text search engine tokenizes the input by splitting up text into searchable chunks. Check out the relevant section in the documentation.

By default Atlas Search does not split words by digits, but if you need that, try to define a custom analyzer with the regex tokenizer and use it for your field:

{
  "mappings": {
    "dynamic": false,
    "fields": {
      "name": [
        {
          "analyzer": "digitSplitter",
          "type": "string"
        }
      ]
    }
  },
  "analyzers": [
    {
      "charFilters": [],
      "name": "digitSplitter",
      "tokenFilters": [],
      "tokenizer": {
        "pattern": "[0-9]+",
        "type": "regexSplit"
      }
    }
  ]
}

Also note that you can use multiple analyzers for string fields, if needed.

qwerty
  • 188
  • 4
2

Atlas search uses Lucene to do the job. Documentation on mongodb site is mostly focused on mongo specific syntax to pass the query to Lucene and might be a bit confusing if you are not familiar with its query language.

First of all, there are number of tokenizers and analizers available, each serve specific purpose. You really need include index definition when you ask quetions about atlas search.

Default tokeniser uses word separators to build the index, then removes endings to store stems, again depending on language, English by default.

So in order to find "space1duplicate" by beginning of the word you can use "autocomplete" analizer with nGram tokens. The index should be created as following:

{
  "mappings": {
    "dynamic": false,
    "fields": {
      "name": {
        "tokenization": "nGram",
        "type": "autocomplete"
      }
    }
  },
  "storedSource": {
    "include": [
      "name"
    ]
  }
}

Once it's indexed (you may need to wait a bit you you have larger dataset), you can find the document with following search:

{
  index: 'search_index',
  compound: {
    must: [
      {
        autocomplete: {
          query: 'spa',
          path: 'name'
        }
      }
    ]
  }
}
silent-box
  • 1,649
  • 3
  • 21
  • 40
Alex Blex
  • 34,704
  • 7
  • 48
  • 75
  • `@Alex` In your `index` you have added the field `name` in `storedSource`. But Document says **Atlas Search doesn't index stored fields and so you can't query these fields**. But Still, I can query `name` even if I added to `storedSource`. I am confused now. – BadPiggie Aug 03 '22 at 10:55
  • @BadPiggie I can say! I am confused with the comment itself =). Is the answer wrong, or what are you trying to say? – Alex Blex Aug 03 '22 at 11:10
  • No. Your answer is perfectly working. But in the [Documentation of Stored Source](https://www.mongodb.com/docs/atlas/atlas-search/stored-source-definition/#std-label-fts-stored-source-definition). They have mentioned this `Atlas Search doesn't index stored fields and so you can't query these fields.`. But in our case, The `name` field is defined in `storedSource` but still can query it. That is why I am confused about `Stored Source`. – BadPiggie Aug 03 '22 at 11:32
  • @BadPiggie It queries ngrams of the `fields`. `StoredSource` there is to return the string straight from Lucene without fetching the document from the database. It's autocompletion - name is the only string that matters, so you can do `"returnStoredSource": true` in the text search query. – Alex Blex Aug 03 '22 at 12:04