0

Per the recommendation of a mongodb atlas consultant, I am attempting to switch over from regex to atlas search for our application's live search feature. We have the following old and new routes for this:

old live-search approach using regex

router.get('/live-search/text/:text/regex', function (req, res) { // regex search
    try {
        let text = req.params.text; // 's'
        let queryFilters = { label: { $regex: `${text}`, $options: 'i' } };

        // And Return Top 20
        db.gs__ptgc_selects
            .find(queryFilters)
            .limit(20)
            .then(data => res.json(data))
            .catch(err => res.status(400).json('Error: ' + err));
    } catch (error) {
        console.log('error: ', error);
        res.status(500).json({ statusCode: 500, message: error.message });
    }
});

new live-search with mongodb atlas's atlas search

outer.get('/live-search/text/:text/atlas', function (req, res) { // atlas search
    try {
        let text = req.params.text;
        let queryFilters = [
            {
                $search: {
                    index: 'default_search', // optional, defaults to "default"
                    autocomplete: { query: `${text}`, path: 'label' } // "tokenOrder": "any|sequential", "fuzzy": <options>, "score": <options>
                }
            },
            { $limit: 20 }
        ];

        // And Return Top 20
        db.gs__ptgc_selects
            .aggregate(queryFilters)
            .then(data => res.json(data))
            .catch(err => res.status(400).json('Error: ' + err));
    } catch (error) {
        console.log('error: ', error);
        res.status(500).json({ statusCode: 500, message: error.message });
    }
});

for the new approach, we created a default_search search index in the mongodb atlas UI, and here is the resulting mappings for that default_search index:

{
    "mappings": {
        "dynamic": false,
        "fields": {
            "label": {
                "maxGrams": 5,
                "minGrams": 3,
                "tokenization": "nGram",
                "type": "autocomplete"
            }
        }
    },
    "storedSource": {
        "include": [
            "label"
        ]
    }
}

Simply put, the quality of the search results using mongodb atlas are not as good as the results using atlas search with this index mappings. For reference, we are searching over the label column in a collection with 200,000 labels of basketball players, teams, and games that looks like this:

search_over = [
  { _id: 'jadkfl', label: 'M: Stanford Cardinal', type: 'team' },
  { _id: 'afdacc', label: 'W: Stanford Cardinal', type: 'team' },
  { _id: 'adsjkf', label: 'Cameron Brink: Stanford', type: 'player' },
  { _id: 'aidjaf', label: 'M: 2023-02-03: Stanford vs Montana', type: 'game' },
  { _id: 'uiuass', label: 'Tam Stanford: Hood', type: 'player' },
  ...
]

Here is an example of search results for stanfo with regex

enter image description here

Here is an example of search results for stanfo with atlas search

enter image description here

As I review this entire post and compare these search results, the 2 biggest concerns I have with the new atlas search results are actually somewhat minor:

  1. I prefer the matching teams M: Stanford Cardinal, W: Stanford Cardinal to be the top 2 results, which they are in regex but not for atlas search.

  2. If I search for Stanford Ca, atlas search returns an empty string, presumably because in the mappings there is a minGram of 3, and Ca has two letters only in the second work. Still seems strange that all of Stanford Ca matches nothing.

Can I improve the /atlas route to sort results by the type field returning team first, and also how can I ensure that Stanford Ca doesn't return an empty array? It is safe to lower minGram from 3 to 1?

Canovice
  • 9,012
  • 22
  • 93
  • 211
  • This doesn't help you achieve your desired results, but just thought I'd comment to say that the regex isn't "_doing the right thing_" on purpose when it comes to ordering the results. I'm not sure if there is an index involved or not in your regex version, but the insertion order of the data is a major driver of the results that you are observing. Check [this playground](https://mongoplayground.net/p/umXEbqHGSCV) where I reorder how the data is created which results in a different order of the results returned. – user20042973 Jun 02 '23 at 17:17
  • @user20042973 - yes it very well could be related to the insertion order, i will look into that – Canovice Jun 02 '23 at 21:47
  • 1
    To be clear: unlike regex, Atlas search _is_ going to return results in a specific order based on weighting/scoring. So insertion order is only relevant to the regex approach. I do think there are ways to influence the scoring, so you may be able to get what you want with Atlas Search - I'm just not familiar with it so will let others reply on regards to that. But the ordering that you're getting for the regex is just 'luck' and regex queries for other values won't all return data the way you're observing in your example here – user20042973 Jun 02 '23 at 22:15
  • yes much appreciated for clarifying. good to know i've been getting lucky rather than regex doing some ordering of its own. – Canovice Jun 04 '23 at 04:34

0 Answers0