2

I have a users collection and an accounts collection (simplified below)

db.users.insertMany([{
    '_id': ObjectId('64595cf18ee18d1695940b70'),
    'name': 'Link',
  }, {
    '_id': ObjectId('64595cf18ee18d1695940b71'),
    'name': 'Navi',
  }, {
    '_id': ObjectId('64595cf18ee18d1695940b72'),
    'name': 'Zelda',
}]);

db.accounts.insertMany([{
    '_id': ObjectId('64595cf18ee18d1695940b80'),
    'name': 'Ocarina of Time',
    'users' : [
      ObjectId('64595cf18ee18d1695940b70'),
      ObjectId('64595cf18ee18d1695940b71'),
      ObjectId('64595cf18ee18d1695940b72'),
    ]
}]);

db.accounts.createIndex({'users': 1});

account.users is an array of all the users belonging to that account and a user can belong to multiple accounts. There is an index on the account.users field.

I need to get all the user records for a specific account. I have the account object, so I can query by account.users like so:

let users_array = account.users;
db.users.find({_id: {$in: users_array}});

explain() of previous query:

{
  "queryPlanner" : {
    "plannerVersion" : 1,
    "namespace" : "mydb.users",
    "winningPlan" : {
      "stage" : "COLLSCAN"
    }
  },
  "serverInfo" : {
    "version" : "4.0.0"
  },
  "ok" : 1,
  "operationTime" : Timestamp(1683606175, 1)
}

With 6K users in the database and almost all of them belonging to one account, the above query takes 8225 ms which seems terribly slow to me.

I am trying to reduce the time it takes to get the users by account.

I have tried using an aggregation pipeline like this using the account id:

db.users.aggregate([
  {
    $lookup: {
      localField: '_id',
      from: 'accounts',
      foreignField: 'users',
      as: 'account',
    },
  }, {
  $match: { 'account._id': ObjectId("64595cf18ee18d1695940b80") },
  }, {
  $project: { name: 1 },
  }
], {allowDiskUse: true});

This takes longer to run: 12278 ms.

explain of previous query:

{
  "queryPlanner" : {
    "plannerVersion" : 1,
    "namespace" : "mydb.users",
    "winningPlan" : {
      "stage" : "SUBSCAN",
      "inputStage" : {
        "stage" : "SORT_AGGREGATE",
        "inputStage" : {
          "stage" : "SORT",
          "inputStage" : {
            "stage" : "NESTED_LOOP_LOOKUP",
            "inputStages" : [
              {
                "stage" : "PROJECTION",
                "inputStage" : {
                  "stage" : "COLLSCAN"
                }
              },
              {
                "stage" : "FETCH",
                "inputStage" : {
                  "stage" : "COLLSCAN"
                }
              }
            ]
          }
        }
      }
    }
  },
  "serverInfo" : {
    "version" : "4.0.0"
  },
  "ok" : 1,
  "operationTime" : Timestamp(1683606343, 1)
}

I also tried aggregating from the account to the users:

db.accounts.aggregate([
  { $match: { '_id': ObjectId("64595cf18ee18d1695940b80") } },
  {
    $lookup: {
      localField: 'users',
      from: 'users',
      foreignField: '_id',
      as: 'users',
    },
  },
  { $unwind: "$users" },
  {
    $replaceRoot: { newRoot: '$users' }
  }
], {allowDiskUse: true});

This also takes longer: 11550 ms.

explain() of previous query:

{
  "queryPlanner" : {
    "plannerVersion" : 1,
    "namespace" : "mydb.accounts",
    "winningPlan" : {
      "stage" : "SUBSCAN",
      "inputStage" : {
        "stage" : "PROJECTION",
        "inputStage" : {
          "stage" : "HASH_AGGREGATE",
          "inputStage" : {
            "stage" : "NESTED_LOOP_LOOKUP",
            "inputStages" : [
              {
                "stage" : "PROJECTION",
                "inputStage" : {
                  "stage" : "FETCH",
                  "inputStage" : {
                    "stage" : "IXSCAN",
                    "indexName" : "_id_"
                  }
                }
              },
              {
                "stage" : "IXSCAN",
                "indexName" : "_id_",
                "direction" : "forward"
              }
            ]
          }
        }
      }
    }
  },
  "serverInfo" : {
    "version" : "4.0.0"
  },
  "ok" : 1,
  "operationTime" : Timestamp(1683606261, 1)
}

When I use explain on these aggregation queries, neither of them uses the asset.users index.

How can I query this data mode quickly? What can I change to take advantage of the indexed field?

Joel Harris
  • 1,966
  • 3
  • 20
  • 32
  • Can you share the verbose (`'allPlansExecution'`) explain output for the queries? The first and third should definitely use the default index in `_id`. The second should use the `users` index but that is not as obvious when looking at explain output. Also, what's the target response time here? – user20042973 May 08 '23 at 23:19
  • Should have mentioned that this is aws-documentdb. I've added the regular `explain()` output since `allPlansExecution` seems to be unsupported by documentdb. – Joel Harris May 09 '23 at 04:30
  • apparently, that's a known issue of documentdb. mongodb won't have the issue. https://skyworkz.nl/blog/what-they-did-not-tell-you-about-amazon-documentdb/ – mask8 May 09 '23 at 05:56

0 Answers0