3

Hope someone can help with the slow Mongo query - it runs fine against smaller collections but once we test it against the larger production collections, it fails with the message "Not enough disk space" even though we had limited the result set to 100.

I feel like there is an issue with the query structure and/or appropriate indexes

Both collections are ~5 million records.

We need help to make this query fast.

// divide these by 1000 because the ts field isn't javascript milliseconds
const startDate = (ISODate("2022-07-01T00:00:00.000Z").getTime()/1000)
const endDate = (ISODate("2022-08-10T00:00:00.000Z").getTime()/1000)
const clientId = xxxx
const ordersCollection = "orders"
const itemsCollection = "items"

db[ordersCollection].aggregate(
  [
      {
          $lookup: { 
             from: itemsCollection,
             localField: "data.id",
             foreignField: "data.orders_id",
             as: "item"
           }
           
      },
      {
          $unwind: "$item"
      },
      {
        $match: {"data.client_id": clientId}
      },
      {
        $match: {"item.ts": {$gt: startDate, $lt: endDate}}
      },
       
      {
        $project: {
            order_id: "$data.id",
            parent_id: "$data.parent_id",
            owner_id: "$data.owner_id",
            client_id: "$data.client_id",
            ts: "$item.ts",
            status: { 
               $cond: {
                     if: {$eq: ["$item.data.status",10] },then: 3,
                     else: {
                          $cond: { if: { $eq: ["$item.data.status",4] },
                            then: 2,
                            else: "$item.data.status"
                       }
                   }
               }
           }
        }
      },
      
      {$group: { _id: {"order_id": "$order_id", "status": "$status"},
         
         order_id: {$first:"$order_id"},
            parent_id: {$first:"$parent_id"},
            owner_id: {$first:"$owner_id"},
            client_id: {$first:"$client_id"},
            ts: {$first:"$ts"},
            status:{$first:"$status"}
          
      }},
      
      {$sort: {"ts": 1}}
      
  ]
).limit(100).allowDiskUse(true)
  • Can you run the explain query and add its result here ? – Deepak Patankar Aug 16 '22 at 04:23
  • For reference: https://www.mongodb.com/docs/v6.0/core/aggregation-pipeline-optimization/ – prasad_ Aug 16 '22 at 04:31
  • `.limit(100)` imposes a limit after the aggregation is computed so will have almost no affect on the time it takes. If you want to limit the aggregation after a certain stage you could use the [`$limit` aggregation stage](https://www.mongodb.com/docs/manual/reference/operator/aggregation/limit/) – apokryfos Aug 16 '22 at 05:09
  • Adding the result example { "_id" : { "order_id" : NumberLong("22222"), "status" : NumberLong("0") }, "order_id" : NumberLong("22222"), "parent_id" : null, "owner_id" : NumberLong("65555"), "client_id" : NumberLong("65555"), "ts" : NumberLong("1657808900"), "status" : NumberLong("0") } – Oleg Simakoff Aug 16 '22 at 17:20

1 Answers1

2

Try pulling $match on the main collection up.
This way you limit the number of documents you need to $lookup on (otherwise we'll try to lookup 5 million documents in other collection of 5 million documents).

Be sure to have an index on data.client_id.

db[ordersCollection].aggregate(
  [
      {
        $match: {"data.client_id": clientId}
      },
      {
        $lookup: { 
           from: itemsCollection,
           localField: "data.id",
           foreignField: "data.orders_id",
           as: "item"
         }   
      },
      {
        $unwind: "$item"
      },
      {
        $match: {"item.ts": {$gt: startDate, $lt: endDate}}
      },
      ...

As a side note limiting the result set to 100 is not helping, as the heaviest part - aggregation with lookups and grouping can not be limited.

Pavel Bely
  • 2,245
  • 1
  • 16
  • 24