Delete documents by date comparison between sum of date and numeric fields and current timestamp

Question

The documents in the MongoDB collection have the following schema (showing only relevant fields):

{
    "TTLinSeconds" : 1800,
    "lastUpdatedTimestamp" : ISODate("...")
}

I need to delete all documents where current timestamp is greater than lastUpdatedTimestamp plus the value stored in the TTLinSeconds field. Thanks in advance for your suggestions.

Sourabh Bhagat · Answer 1 · 2017-02-06T10:45:12.447

6

This might work (to delete field between 2017-01-25T00:30:00Z and 2017-01-26T23:59:00Z ):

  db.collectionName.remove({
       $and : [
        {"lastUpdatedTimestamp": { 
           $gte:   ISODate("2017-01-25T00:30:00Z"), 
           $lt :   ISODate("2017-01-26T23:59:00Z")
       },
       {"TTLinSeconds" : value}
      ]
    }

Suggestion:

I would recommend you to use the common naming convention for your variables.

Reference :

https://docs.mongodb.com/manual/reference/operator/update/currentDate/

edited Feb 06 '17 at 10:45

answered Feb 06 '17 at 04:56

Sourabh Bhagat

1,691
17
20

I am trying to solve the same issue as OP, but I don't see how your answer could be used to solve OP's issue of removing where `lastUpdatedTimestamp + TTLinSeconds < now()`. Do you think you could explain how I could use your answer to achieve that? – Michał Mar 30 '18 at 23:16

score 2 · Answer 2 · answered Feb 06 '17 at 05:01

2

You could do this using $where as:

db.collectionName.remove({$where:"new Date().getTime() > this.lastUpdatedTimestamp.getTime() + (this.TTLinSeconds * 1000)"})

answered Feb 06 '17 at 05:01

oblivion

5,928
3
34
55

Thanks! This is great, I like the simplicity of this solution. I didn't know that `this` keyword can be used in MongoDB. One question though: how is this solution different from the one provided by chridam? Maybe there are possible performance issues possible with one or another? – Jacek Feb 07 '17 at 04:06
Well there are obviously some `considerations` for both cases. For `$where`, you could see https://docs.mongodb.com/manual/reference/operator/query/where/#considerations. Similarly, for `aggregation`, you could see http://stackoverflow.com/questions/38840669/mongodb-aggregation-performance-capability . And for `$redact`, you can see :http://stackoverflow.com/questions/32653442/redact-in-mongodb-seems-obscure-to-me. And I also recommend you to perform `explain()` on both queries to see how well both queries perform on your setup. – oblivion Feb 07 '17 at 05:08

chridam · Accepted Answer · 2017-02-09T07:21:18.813

You can use the aggregation framework with the $redact and $out pipelines to remove the documents which satisfy the given condition.

The $redact pipeline incorporates the functionality of $project and $match to implement field level redaction where it will return all documents matching the condition using $$KEEP and removes those that don't match using the $$PRUNE variable.

The $out pipeline writes the resulting documents of the aggregation pipeline to the same collection thus essentially doing an update operation.

Running the following aggregate operation will remove the documents:

db.collection.aggregate([
    {
        "$redact": {
            "$cond": [
                { 
                    "$lte": [ 
                        new Date(), 
                        { 
                            "$add": [
                                "$lastUpdatedTimestamp", 
                                { "$multiply": ["$TTLinSeconds", 1000] }
                            ] 
                        } 
                    ] 
                },
                "$$KEEP",
                "$$PRUNE"
            ]
        }
    },
    { "$out": "collection" }
])

As caveat, using the $where operator won't perform very well as it takes a while to process the query because of what MongoDB is doing behind the scenes: when you perform a normal (non-$where) query, your client turns that query into BSON and sends it to the database. MongoDB stores data in BSON, too, so it can basically compare your query directly against the data. This is very fast and efficient.

On the other hand, when you have a $where clause that must be executed as part of your query, MongoDB will have to create a JavaScript object for every document in the collection, parsing the documents' BSON and adding all of their fields to the JavaScript objects. It then executes the JavaScript you sent against the documents, then tears it all down again. This is an extremely time and resource-intensive operation as it calls the JavaScript engine to evaluate Javascript code on every document and checks the condition for each. It is advisable to combine with indexed queries if you can so that the query may be faster.

Some considerations you have to look at when using $where:

Do not use global variables.

$where evaluates JavaScript and cannot take advantage of indexes. Therefore, query performance improves when you express your query using the standard MongoDB operators (e.g., $gt, $in). In general, you should use $where only when you can’t express your query using another operator. If you must use $where, try to include at least one other standard query operator to filter the result set. Using $where alone requires a table scan. Using normal non-$where query statements provides the following performance advantages:

MongoDB will evaluate non-$where components of query before $where statements. If the non-$where statements match no documents, MongoDB will not perform any query evaluation using $where. The non-$where query statements may use an index.

$where is a good hack when necessary, but it should be avoided whenever possible. If a $where query is needed, you can cut down on the performance hit by minimising the number of documents that make it to the $where or create an additional computed denormalized field say expiryDate that is the sum of the lastUpdatedTimestamp and the TTLinSeconds fields which you can then query as:

db.collection.remove({ "expiryDate": { "$lt": new Date() }});

But still, such low-selectivity fields don't yield good index performance if the collection is very large thus the candidate set for indexing is large with this approach.

Thank you very much for your answer and proposed solution. It looks elegant, however I must admit I'm still struggling trying to understand the aggregation framework in MongoDB, especially its syntax. Would there be any recommendations towards using $redact against simple $where? Also, I'm not sure the line `{ "$add": ["$lastUpdatedTimestamp", "$TTLinSeconds"] }` will add seconds from the `TTLinSeconds` field to the `lastUpdatedTimestamp`? — Jacek, Feb 07 '17 at 04:13
Thank you very much for your edits! It has been a good learning experience. One more question: [The docs](https://docs.mongodb.com/manual/reference/operator/aggregation/add/) state: `If one of the arguments is a date, $add treats the other arguments as milliseconds to add to the date.`. How can I multiply `$TTLinSeconds` by 1000 in `{ "$add": ["$lastUpdatedTimestamp", "$TTLinSeconds"] }`? — Jacek, Feb 09 '17 at 02:03
To convert seconds to milliseconds, you need to use [**`$multiply`**](https://docs.mongodb.com/manual/reference/operator/aggregation/multiply/#exp._S_multiply) operator, as in the recent edit. — chridam, Feb 09 '17 at 07:22

Delete documents by date comparison between sum of date and numeric fields and current timestamp

3 Answers3