0

Help me please. Working with Mongoose, and I want to check if there are duplicates in the "ViolationID" field and delete them, except the last one created. I think, _id: { $lt: record._id } works incorrect. May be some types of _id or record._id not matched. I'm confused, because documents in collection not deleting.

Violation.find({}, { "ViolationID": 1 })
    .sort({ _id: 1 })
    .then((violations) => {
      violations.forEach(function (record) {
        Violation.deleteMany({
          _id: { $lt: record._id },
          "ViolationID": record["ViolationID"],
        });
      });
});

Below documents that cannot be deleted

{
_id": "649adc629b36c9ee95228d96",
"ViolationID": 98,
},
{
"_id": "649add653629f115a960d498",
"ViolationID": 98
}

I tried this Remove duplicate documents based on field and other threads, these cases don't work for me

Artembash
  • 16
  • 3

1 Answers1

0

I don't see the reason why it does not work. Anyway, may approach would be this one:

let docs = []
db.collection.aggregate([
   {
      $setWindowFields: {
         partitionBy: "$ViolationID",
         sortBy: { _id: -1 },
         output: {
            pos: { $documentNumber: {} }
         }
      }
   },
   { $match: { pos: { $gt: 1 } } },
   { $project: { _id: 1 } }
]).toArray().forEach(doc => {
   docs.push(doc._id);
   if (docs.lenght > 10000) {
      db.collection.deleteMany({ _id: { $in: docs } });
      docs = [];
   }
})
if (docs.lenght > 0)
   db.collection.deleteMany({ _id: { $in: docs } });

   
Wernfried Domscheit
  • 54,457
  • 9
  • 76
  • 110
  • 1
    `"ViolationID": record["ViolationID"]` is needed in OP's approach since they are deleting multiple documents and the only other condition is a range predicate on `_id`. Without it they'll delete documents who do have a lower `_id` value but have different `ViolationID`s – user20042973 Jun 29 '23 at 11:34
  • @user20042973 in deed, I missed that. Modified my answer. – Wernfried Domscheit Jun 29 '23 at 11:39