I'm relatively new to mongo and I have a collection that looks like this:
[
{
"stored": {
"box": [
{
"parcelId": "uwb1",
"status": "ACTIVE"
}
]
},
"checked": {
"box": [
{
"parcelId": "uwb1",
"status": "ACTIVE"
}
]
}
},
{
"stored": {
"box": [
{
"parcelId": "aqrf123",
"status": "PENDING"
}
]
},
"checked": {
"box": [
{
"parcelId": "aqrf123",
"status": "PENDING"
}
]
}
},
{
"checked": {
"box": [
{
"parcelId": "zuz873",
"status": "ACTIVE"
}
]
}
}
]
Some observations regarding the data:
- the document will always have the
checked
field but might not have thestored
field - the
checked
andstored
fields have the same schema - both will always have the
box
field and we can make the assumption that thebox
field will always have 1 element in the array (only 1, not more, not less) - the amount of documents in this collection is relatively high (~100 mil)
What I'm trying to achieve is to get the documents sorted by that status
field, which is like an enum, and it can have 3 values - ACTIVE
, PENDING
and REJECTED
.
- If for a document, the
stored
field exist, I'll take it from there and disregard thechecked
field. - Otherwise I'm going to have to take it from the
checked
field, which is guaranteed to exist, as previously mentioned. - An important requirement is to have the entire document returned to the consumer / client, so I cannot use
projection
to reduce the amount of data from the documents (which would probably make the whole operation faster).
How I've tried to achieve this is by using an aggregation which looks like this:
db.getCollection('entries')
.aggregate([{
$addFields: {
sortStatus: {
$ifNull: [{
$let: {
vars: {
box: {
$arrayElemAt: [
"$stored.box", 0
]
}
},
in: "$$box.status"
}
}, {
$let: {
vars: {
box: {
$arrayElemAt: [
"$checked.box", 0
]
}
},
in: "$$box.status"
}
}]
}
}
},
{
$sort: {
sortStatus: 1
}
}
], {
allowDiskUse: true
})
Which seems to do the job but it feels quite slow. Also there's the allowDiskUse
which makes me uncomfortable a bit. If I leave it out, I get the Sort exceeded memory limit of x bytes, but did not opt in to external sorting. Aborting operation. Pass allowDiskUse:true to opt in
error message.
So my questions are:
- Are there faster alternatives, be it with or without aggregation?
- Are there any risks in using the
allowDiskUse
option when doing an aggregation? - Would it be better (or is it the "mongo" way) to alter a bit the document structure and add that sortable field to the root of the document, add an index for it and just use
.sort({"statusField": 1})
? This would be the last resort option, as I'd have to migrate the existing data.