You can use the aggregation framework with the $redact
and $out
pipelines to remove the documents which satisfy the given condition.
The $redact
pipeline incorporates the functionality of $project
and $match
to implement field level redaction where it will return all documents matching the condition using $$KEEP
and removes those that don't match using the $$PRUNE
variable.
The $out
pipeline writes the resulting documents of the aggregation pipeline to the same collection thus essentially doing an update operation.
Running the following aggregate operation will remove the documents:
db.collection.aggregate([
{
"$redact": {
"$cond": [
{
"$lte": [
new Date(),
{
"$add": [
"$lastUpdatedTimestamp",
{ "$multiply": ["$TTLinSeconds", 1000] }
]
}
]
},
"$$KEEP",
"$$PRUNE"
]
}
},
{ "$out": "collection" }
])
As caveat, using the $where
operator won't perform very well as it takes a while to process the query because of what MongoDB is doing behind the scenes: when you perform a normal (non-$where
) query, your client turns that query into BSON and sends it to the database. MongoDB stores data in BSON, too, so it can basically compare your query directly against the data. This is very fast and efficient.
On the other hand, when you have a $where
clause that must be executed as part of your query, MongoDB will have to create a JavaScript object for every document in the collection, parsing the documents' BSON and adding all of their fields to the JavaScript objects. It then executes the JavaScript you sent against the documents, then tears it all down again. This is an extremely time and resource-intensive operation as it calls the JavaScript engine to evaluate Javascript code on every document and checks the condition for each. It is advisable to combine with indexed queries if you can so that the query may be faster.
Some considerations you have to look at when using $where
:
Do not use global variables.
$where
evaluates JavaScript and cannot take advantage of indexes.
Therefore, query performance improves when you express your query
using the standard MongoDB operators (e.g., $gt
, $in
). In general, you
should use $where
only when you can’t express your query using another
operator. If you must use $where
, try to include at least one other
standard query operator to filter the result set. Using $where
alone
requires a table scan. Using normal non-$where
query statements
provides the following performance advantages:
MongoDB will evaluate non-$where
components of query before $where
statements. If the non-$where
statements match no documents, MongoDB
will not perform any query evaluation using $where
. The non-$where
query statements may use an index.
$where
is a good hack when necessary, but it should be avoided whenever possible. If a $where
query is needed, you can cut down on the performance hit by minimising the number of documents that make it to the $where
or create an additional computed denormalized field say expiryDate
that is the sum of the lastUpdatedTimestamp
and the TTLinSeconds
fields which you can then query as:
db.collection.remove({ "expiryDate": { "$lt": new Date() }});
But still, such low-selectivity fields don't yield good index performance if the collection is very large thus the candidate set for indexing is large with this approach.