1

I am getting familiarized with Lucene and MongoDB Atlas search, and I have a question about query efficiency.

Which one of those queries uses fewer resources?

If there are better queries for performing the below task, please let me know.

I want to return all movies (sample_mflix) that match on a title value. The movies must be for a specific year (should not return any movie that is not for that year), and I would like to return movies with "$gte" values for movies.awards.nominations & movies.awards.wins.

The first query seems more complex (which seems to increase resource utilization - query complexity?). This query also is not returning values for that year only. That makes me think that there is probably a better way to do this with Atlas search.

The second query uses the $search and a $match in different stages. It has a simple Lucene search (which might return more movies than the first query?), and the match operator will filter the results. The second query is more precise - from my tests, it respects the year constraint. If I apply a limit stage, would this be a better solution?

If those queries were executed in the same scenario, which one would be more efficient, and why (apologies, the second query is formatted for .net driver)?

new BsonArray
{
    new BsonDocument("$search", 
    new BsonDocument
        {
            { "index", "nostoreindex" }, 
            { "compound", 
    new BsonDocument
            {
                { "must", 
    new BsonDocument("near", 
    new BsonDocument
                    {
                        { "path", "year" }, 
                        { "origin", 2000 }, 
                        { "pivot", 1 }
                    }) }, 
                { "must", 
    new BsonDocument("text", 
    new BsonDocument
                    {
                        { "query", "poor" }, 
                        { "path", "title" }
                    }) }, 
                { "should", 
    new BsonDocument("range", 
    new BsonDocument
                    {
                        { "path", "awards.nominations" }, 
                        { "gte", 1 }
                    }) }, 
                { "should", 
    new BsonDocument("range", 
    new BsonDocument
                    {
                        { "path", "awards.wins" }, 
                        { "gte", 1 }
                    }) }
            } }
        })
}

VS

var searchStage =
                new BsonDocument("$search",
                    new BsonDocument
                        {
                            { "index", "nostoreindex" },
                            { "text",
                    new BsonDocument
                            {
                                { "query", title },
                                { "path", "title" }
                            } }
                        });
            var matchStage = new BsonDocument("$match",
            new BsonDocument("$and",
            new BsonArray
                    {
                new BsonDocument("year",
                new BsonDocument("$eq", year)),
                new BsonDocument("awards.nominations",
                new BsonDocument("$gte", nominations)),
                new BsonDocument("awards.wins",
                new BsonDocument("$gte", awards))
                    })
            );
Doug
  • 14,387
  • 17
  • 74
  • 104

1 Answers1

1

When using Atlas Search, it is better to avoid using a succeeding $match filter after your $search stage. This is because all data will need to be looked up in your mongod by id, which can be quite slow.

So, generally, you are trying to keep your search and filters "in Lucene" if possible, to avoid extra IO and comparisons.

In your case, you are using near which will return all results in order descending from near. You should use range instead which can filter those results and speed up your query.

near is used to score your results higher if they are closer to a specific value, which can simulate a sort. For example, if you want to score results with higher 'awards.wins' you may wish to add a near : { origin: 10000, pivot: 1} then the closer the value is to 10000 the higher the score.

new BsonArray
{
    new BsonDocument("$search", 
    new BsonDocument
        {
            { "index", "nostoreindex" }, 
            { "compound", 
    new BsonDocument
            {
                { "must", 
    new BsonDocument("range", 
    new BsonDocument
                    {
                        { "path", "year" }, 
                        { "gte", 2000 }, 
                        { "lte", 2000 }
                    }) }, 
                { "must", 
    new BsonDocument("text", 
    new BsonDocument
                    {
                        { "query", "poor" }, 
                        { "path", "title" }
                    }) }, 
                { "should", 
    new BsonDocument("range", 
    new BsonDocument
                    {
                        { "path", "awards.nominations" }, 
                        { "gte", 1 }
                    }) }, 
                { "should", 
    new BsonDocument("range", 
    new BsonDocument
                    {
                        { "path", "awards.wins" }, 
                        { "gte", 1 }
                    }) }
            } }
        })
}
Doug
  • 14,387
  • 17
  • 74
  • 104