I want to perform full-text search on a field for documents in a collection. However, I don't want to search in every document in the collection. That is, before full-text searching, I want to filter out documents, and then perform the full-text search only on the resulting documents.
Is it possible to buid a search index and use the $search
aggregation pipeline stage to efficiently filter out documents (e.g. filter out documents not containng a certain ObjectId
in a certain field) and then be able to perform full-text search on other fields?
By efficient, I mean for the filter part of the query to be done without scanning all indexed data, similar to how a database index on a given field would avoid a collection scan.
To better explain my question, I’ll present a simplified scenario.
Consider a stores collection and a products collection. The document schema for both collections is as follows:
// Stores schema:
{
_id: ObjectId,
name: String
}
// Products schema:
{
_id: ObjectId,
name: String,
store: ObjectId
}
Every product has a name and belongs to a store.
Suppose there is an application where the user is able to choose a store and then full-text seach for products in that store by name.
To implement the search feature, I’d create the following search index:
{
collectionName: 'products',
name: 'productName_index',
mappings: {
dynamic: false,
fields: {
store: {
type: "objectId",
},
name: [
{ type: "string" },
{ type: "autocomplete" }
]
}
}
}
And use the following aggregation pipeline (in node.js) to query:
// Known store _id
const storeId = new ObjectId()
const searchQuery = "someProductName"
const pipeline = {
$search: {
index: "productName_index",
compound: {
filter: [
{ equals: {
path: "store",
query: storeId
}}
],
should: [
{ text: {
path: "name",
query: searchQuery
}},
{ autocomplete: {
path: "name",
query: searchQuery
}}
],
minimumShouldMatch: 1
}
}
}
This query ensures resulting products belong to a single store, and then performs full-text search on the name
field. However, I think that all indexed data for the productsName_index
is scanned.
If instead I were to use a compound database index: { store: 1, name: 1 }
, I could use an aggregation pipeline with a $match
stage to filter out products that do not belong to a store without performing a collection scan. But then, I would no longer be able to full-text search.
So then, would the above query go through every indexed store
field? If so, is it possible to build a search index that supports this kind of queries more efficiently?