i am trying to speed up a query that im performing to a collection that contains more than 10 million of documents. An example of document looks like this
{
nMove: 2041242,
typeMove: 'Sold',
date: "2016-05-18T16:00:00Z",
operation: 'output',
origin: {
id: '3234fds32fds42',
name: 'Main storage',
},
products: [{
id: '342fmdsff23324432',
name: 'Product 1',
price: 34,
quantity: 9
}],
}
Now i have to query all the documents that $match with a given 'product.id' or with 'origin.id' or both, and $sum the quantity of product.quantity in total.
So i am performing a query like this.
movesModel.aggregate([
{
$match: {
$expr: {
$and: [
{ $in: [req.params.idProduct, '$product.id'] },
{ $eq: ['$origin.id', req.params.idOrigin }] },
]
}
}
},
{
$project: {
_id: 0,
outputs: {
$sum: {
$cond: { if: { $eq: ['$operation', 'input'] }, then: '$product.quantity', else: 0 }
}
},
inputs: {
$sum: {
$cond: { if: { $eq: ['$operation', 'output'] }, then: '$product.quantity', else: 0 }
}
}
}
},
{
$group: {
_id: '$_id',
inputs: { $sum: '$inputs' },
outputs: { $sum: '$outputs' }
}
},
]).then((result) => {
res.json(result)
})
This query is taking like about 1 minute to be resolved... Sometimes this query $match with more than 200k documents... considering that i don't need the whole data, i just need the sum of quantity... i have some questions... ( I am mongodb noob )
About indexes.. i created a compound index db.moves.createIndex({ 'origin.id': 1, 'product.id':1}). Is it correct? should i change it?
is my query ok? can i improve it?
To prevent that a query match with 200k documents... i did something tricky. i added a field called 'date', and i would like to take all the documents that match with the 'origin.id', 'product.id' and is $gte: date , but it takes the same time... even when it only match 1 document...
Finishing... i think, that all the problem that i have is about indexes.. so i tried to check my indexStats... but it seems to be not working with my aggregation query.
Any help is appreciated. thank you
////////////FULL PIPELINE////////////
In this case i have two more collections called 'storages' and 'inventories'
//storage examples
{
_id: '3234fds32fds42'
name: 'Main storage'
status: true
}
{
_id: '32f32f32432sda'
name: 'Other storage'
status: true
}
//invetories examples
{
_id: 'fvavcsa3a3aa3'
date: '2020-01-01'
storage: {
_id: '3234fds32fds42'
name: 'Main storage'
}
products: [{
id: '342fmdsff23324432',
name: 'Product 1',
}],
}
So this is why i was using $lookup, what i really need is get all moves, that match with each storage and product.
//also i added invetories to filter by date and prevent to match tons of documents
So this is my query.
storagesModel.aggregation([
{
$match: { status: true }
},
{
$lookup: {
from: 'inventories',
as: 'inventory',
let: { "idStorage": "$_id" },
pipeline: [
{
$match: {
$expr: {
$and: [
{ $eq: ['$storage._id', { $toString: "$$idStorage" }] },
{ $in: [req.params.id, '$products._id'] }
]
}
},
},
{
$sort: { date: -1 } // TO TAKE THE LAST INVENTORY
},
{
$limit: 1
}
]
}
},
{ $unwind: { path: '$inventories', preserveNullAndEmptyArrays: true } }, //DECONSTRUCT THE ARRAY AND GET IT AS OBJECT
{
$lookup: {
from: 'moves',
as: 'moves',
let: {
"idStorage": "$_id",
'date': '$inventory.date'},
pipeline: [
{
$match: {
$expr: {
$and: [
{ $gte: ['$date', $$date] }
{ $eq: ['$origin.id', '$$idStorage' }] },
{ $in: [req.params.idProduct, '$product.id'] },
]
}
}
},
{
$project: {
_id: 0,
outputs: {
$sum: {
$cond: { if: { $eq: ['$operation', 'input'] }, then: '$product.quantity', else: 0 }
}
},
inputs: {
$sum: {
$cond: { if: { $eq: ['$operation', 'output'] }, then: '$product.quantity', else: 0 }
}
}
}
},
{
$group: {
_id: '$_id',
inputs: { $sum: '$inputs' },
outputs: { $sum: '$outputs' }
}
},
]
}
])
Summarizing...
What i need to get, is the total of 'moves' that a product has in every storage. Taking in to account that maybe there is an 'inventory' that can give you a date to prevent match a lot of documents. So that why i am using the storageModel and use $lookup stage for get the last inventory, so i have the $$date and $$idStorage. then i use $lookup from 'moves' collection... well i know is a heavy query but... i thought that giving a date and with a proper compound index it should be fast... but even when i try to get the 'moves' for a product that has only few 'moves'... it takes like 20 or 30 seconds...
I tried doing a single query without $lookup and even matching with 400k documents it takes 1-2 sec...
what do you think about? Thanks for your help