There is deal
collection containing 50 million documents with this structure
{
"user": <string>,
"price": <number>
}
The collection has this index
{
"v" : 2,
"unique" : false,
"key" : {
"user" : 1
},
"name" : "deal_user",
"background" : true,
"sparse" : true
}
Trying to execute this aggregation
db.getCollection('deal').aggregate([
{
$match: {user: {$in: [
"u-4daf809bd",
"u-b967c0a84",
"u-ae03417c1",
"u-805ce887d",
"u-0f5160823",
/* more items */
]}}
},
{
$group: {
_id: "$user",
minPrice: {$min: "$price"}
}
}
])
When array size inside $match / $in
is less than ~50 then query response time mostly is less than 2 seconds. For large arrays (size > 200) response time is around 120 seconds.
Also tried an approach with chunking the array into parts containing 10-50 elements and querying in parallel. There is a strange (reproducible) effect: most of the queries 80% respond fast (2-5 seconds), but there are some that hung ~100 seconds, so parallelisation did not bring a fruit.
I guess that the are some kind of "slow" and "normal" values, but can not explain what is going on, because they all belong to the same index and are expected to be fetched in approx same time. It slightly correlates with amount of duplicates in user
field (i.e. each grouped value), but looks like:
"big size of duplicates"
does not always entail "slow"
Please, help me to understand why this MongoDB query behaves this way.
What is the proper way to do this kind of queries?