I have a users collection and an accounts collection (simplified below)
db.users.insertMany([{
'_id': ObjectId('64595cf18ee18d1695940b70'),
'name': 'Link',
}, {
'_id': ObjectId('64595cf18ee18d1695940b71'),
'name': 'Navi',
}, {
'_id': ObjectId('64595cf18ee18d1695940b72'),
'name': 'Zelda',
}]);
db.accounts.insertMany([{
'_id': ObjectId('64595cf18ee18d1695940b80'),
'name': 'Ocarina of Time',
'users' : [
ObjectId('64595cf18ee18d1695940b70'),
ObjectId('64595cf18ee18d1695940b71'),
ObjectId('64595cf18ee18d1695940b72'),
]
}]);
db.accounts.createIndex({'users': 1});
account.users
is an array of all the users belonging to that account and a user can belong to multiple accounts. There is an index on the account.users field.
I need to get all the user records for a specific account. I have the account object, so I can query by account.users
like so:
let users_array = account.users;
db.users.find({_id: {$in: users_array}});
explain()
of previous query:
{
"queryPlanner" : {
"plannerVersion" : 1,
"namespace" : "mydb.users",
"winningPlan" : {
"stage" : "COLLSCAN"
}
},
"serverInfo" : {
"version" : "4.0.0"
},
"ok" : 1,
"operationTime" : Timestamp(1683606175, 1)
}
With 6K users in the database and almost all of them belonging to one account, the above query takes 8225 ms which seems terribly slow to me.
I am trying to reduce the time it takes to get the users by account.
I have tried using an aggregation pipeline like this using the account id:
db.users.aggregate([
{
$lookup: {
localField: '_id',
from: 'accounts',
foreignField: 'users',
as: 'account',
},
}, {
$match: { 'account._id': ObjectId("64595cf18ee18d1695940b80") },
}, {
$project: { name: 1 },
}
], {allowDiskUse: true});
This takes longer to run: 12278 ms.
explain
of previous query:
{
"queryPlanner" : {
"plannerVersion" : 1,
"namespace" : "mydb.users",
"winningPlan" : {
"stage" : "SUBSCAN",
"inputStage" : {
"stage" : "SORT_AGGREGATE",
"inputStage" : {
"stage" : "SORT",
"inputStage" : {
"stage" : "NESTED_LOOP_LOOKUP",
"inputStages" : [
{
"stage" : "PROJECTION",
"inputStage" : {
"stage" : "COLLSCAN"
}
},
{
"stage" : "FETCH",
"inputStage" : {
"stage" : "COLLSCAN"
}
}
]
}
}
}
}
},
"serverInfo" : {
"version" : "4.0.0"
},
"ok" : 1,
"operationTime" : Timestamp(1683606343, 1)
}
I also tried aggregating from the account to the users:
db.accounts.aggregate([
{ $match: { '_id': ObjectId("64595cf18ee18d1695940b80") } },
{
$lookup: {
localField: 'users',
from: 'users',
foreignField: '_id',
as: 'users',
},
},
{ $unwind: "$users" },
{
$replaceRoot: { newRoot: '$users' }
}
], {allowDiskUse: true});
This also takes longer: 11550 ms.
explain()
of previous query:
{
"queryPlanner" : {
"plannerVersion" : 1,
"namespace" : "mydb.accounts",
"winningPlan" : {
"stage" : "SUBSCAN",
"inputStage" : {
"stage" : "PROJECTION",
"inputStage" : {
"stage" : "HASH_AGGREGATE",
"inputStage" : {
"stage" : "NESTED_LOOP_LOOKUP",
"inputStages" : [
{
"stage" : "PROJECTION",
"inputStage" : {
"stage" : "FETCH",
"inputStage" : {
"stage" : "IXSCAN",
"indexName" : "_id_"
}
}
},
{
"stage" : "IXSCAN",
"indexName" : "_id_",
"direction" : "forward"
}
]
}
}
}
}
},
"serverInfo" : {
"version" : "4.0.0"
},
"ok" : 1,
"operationTime" : Timestamp(1683606261, 1)
}
When I use explain
on these aggregation queries, neither of them uses the asset.users
index.
How can I query this data mode quickly? What can I change to take advantage of the indexed field?