0

My collection has one duplicate of each document, except for having different _ids:

{ _id: ObjectId("5ff22dcd3c8ce5f425c08a6d"),
  model: '1r9',
  path: 'path1.png',
  xmax: 460,
  xmin: 395,
  ymax: 464,
  ymin: 406 }

{ _id: ObjectId("5ff42dcd7c8ce5f425c08a70"),
  model: '1r9',
  path: 'path1.png',
  xmax: 460,
  xmin: 395,
  ymax: 464,
  ymin: 406 }

I have tried a lot of the solutions here: Fastest way to remove duplicate documents in mongodb

However, I am using a MongoDB Atlas Cluster which does not allow for allowDiskUse: true

Is there some way to delete these duplicates without running a long loop across the entire collection, which would take a long time?

law826
  • 43
  • 6

1 Answers1

0

I recently create a code to delete duplicated documents from MongoDB, this should work:

const query = [
  {
    $group: {
      _id: {
        model: "$model",
      },
      dups: {
        $addToSet: "$_id",
      },
      count: {
        $sum: 1,
      },
    },
  },
  {
    $match: {
      count: {
      $gt: 1,
      },
    },
  },
];

const cursor = collection.aggregate(query).cursor({ batchSize: 10 }).exec();

cursor.eachAsync((doc, i) => {
  doc.dups.shift(); // First element skipped for deleting
  doc.dups.map(async (dupId) => {
    await collection.findByIdAndDelete({ _id: dupId });
  });
});