0

I want to perform full-text search on a field for documents in a collection. However, I don't want to search in every document in the collection. That is, before full-text searching, I want to filter out documents, and then perform the full-text search only on the resulting documents.

Is it possible to buid a search index and use the $search aggregation pipeline stage to efficiently filter out documents (e.g. filter out documents not containng a certain ObjectId in a certain field) and then be able to perform full-text search on other fields?

By efficient, I mean for the filter part of the query to be done without scanning all indexed data, similar to how a database index on a given field would avoid a collection scan.

To better explain my question, I’ll present a simplified scenario.

Consider a stores collection and a products collection. The document schema for both collections is as follows:

// Stores schema:
{
    _id: ObjectId,
    name: String
}

// Products schema:
{
    _id: ObjectId,
    name: String,
    store: ObjectId
}

Every product has a name and belongs to a store.

Suppose there is an application where the user is able to choose a store and then full-text seach for products in that store by name.

To implement the search feature, I’d create the following search index:

{
    collectionName: 'products',
    name: 'productName_index',
    mappings: {
        dynamic: false,
        fields: {
            store: {
                type: "objectId",
            },
            name: [
                { type: "string" },
                { type: "autocomplete" }
            ]
        }
    }
}

And use the following aggregation pipeline (in node.js) to query:

 // Known store _id
const storeId = new ObjectId()

const searchQuery = "someProductName"

const pipeline = {
    $search: {
        index: "productName_index",
        compound: {
            filter: [
                { equals: {
                    path: "store",
                    query: storeId
                }}
            ],
            should: [
                { text: {
                    path: "name",
                    query: searchQuery
                }},
                { autocomplete: {
                    path: "name",
                    query: searchQuery
                }}
            ],
            minimumShouldMatch: 1
        }
    }
}

This query ensures resulting products belong to a single store, and then performs full-text search on the name field. However, I think that all indexed data for the productsName_index is scanned.

If instead I were to use a compound database index: { store: 1, name: 1 }, I could use an aggregation pipeline with a $match stage to filter out products that do not belong to a store without performing a collection scan. But then, I would no longer be able to full-text search.

So then, would the above query go through every indexed store field? If so, is it possible to build a search index that supports this kind of queries more efficiently?

8ctavio
  • 1
  • 1
  • 2

0 Answers0