Per the recommendation of a mongodb atlas consultant, I am attempting to switch over from regex
to atlas search
for our application's live search feature. We have the following old and new routes for this:
old live-search approach using regex
router.get('/live-search/text/:text/regex', function (req, res) { // regex search
try {
let text = req.params.text; // 's'
let queryFilters = { label: { $regex: `${text}`, $options: 'i' } };
// And Return Top 20
db.gs__ptgc_selects
.find(queryFilters)
.limit(20)
.then(data => res.json(data))
.catch(err => res.status(400).json('Error: ' + err));
} catch (error) {
console.log('error: ', error);
res.status(500).json({ statusCode: 500, message: error.message });
}
});
new live-search with mongodb atlas's atlas search
outer.get('/live-search/text/:text/atlas', function (req, res) { // atlas search
try {
let text = req.params.text;
let queryFilters = [
{
$search: {
index: 'default_search', // optional, defaults to "default"
autocomplete: { query: `${text}`, path: 'label' } // "tokenOrder": "any|sequential", "fuzzy": <options>, "score": <options>
}
},
{ $limit: 20 }
];
// And Return Top 20
db.gs__ptgc_selects
.aggregate(queryFilters)
.then(data => res.json(data))
.catch(err => res.status(400).json('Error: ' + err));
} catch (error) {
console.log('error: ', error);
res.status(500).json({ statusCode: 500, message: error.message });
}
});
for the new approach, we created a default_search
search index in the mongodb atlas UI, and here is the resulting mappings for that default_search
index:
{
"mappings": {
"dynamic": false,
"fields": {
"label": {
"maxGrams": 5,
"minGrams": 3,
"tokenization": "nGram",
"type": "autocomplete"
}
}
},
"storedSource": {
"include": [
"label"
]
}
}
Simply put, the quality of the search results using mongodb atlas are not as good as the results using atlas search with this index mappings. For reference, we are searching over the label column in a collection with 200,000 labels of basketball players, teams, and games that looks like this:
search_over = [
{ _id: 'jadkfl', label: 'M: Stanford Cardinal', type: 'team' },
{ _id: 'afdacc', label: 'W: Stanford Cardinal', type: 'team' },
{ _id: 'adsjkf', label: 'Cameron Brink: Stanford', type: 'player' },
{ _id: 'aidjaf', label: 'M: 2023-02-03: Stanford vs Montana', type: 'game' },
{ _id: 'uiuass', label: 'Tam Stanford: Hood', type: 'player' },
...
]
Here is an example of search results for stanfo
with regex
Here is an example of search results for stanfo
with atlas search
As I review this entire post and compare these search results, the 2 biggest concerns I have with the new atlas search results are actually somewhat minor:
I prefer the matching teams
M: Stanford Cardinal
,W: Stanford Cardinal
to be the top 2 results, which they are in regex but not for atlas search.If I search for
Stanford Ca
, atlas search returns an empty string, presumably because in themappings
there is a minGram of 3, andCa
has two letters only in the second work. Still seems strange that all ofStanford Ca
matches nothing.
Can I improve the /atlas
route to sort results by the type
field returning team
first, and also how can I ensure that Stanford Ca
doesn't return an empty array? It is safe to lower minGram
from 3 to 1?