0

Each document in the collection looks like this. In this case, A and C are fine but B has a duplicate.

{
  "_id": {
    "$oid": "5bef93fc1c4b3236e79f9c25" # all these are unique
  },
  "Created_at": "Sat Nov 17 04:07:12 +0000 2018",
  "ID": {
    "$numberLong": "1063644700727480320" # duplicates identified by this ID
  },
  "Category": "A" #this is the category
}

{
  "_id": {
    "$oid": "5bef93531c4b3236e79f9c11"
  },
  "Created_at": "Sat Nov 17 05:17:12 +0000 2018",
  "ID": {
    "$numberLong": "1063644018276360192"
  },
  "Category": "B" 
}

{
  "_id": {
    "$oid": "5bef94e81c4b3236e79f9c3b"
  },
  "Created_at": "Sat Nov 17 05:17:12 +0000 2018",
  "ID": {
    "$numberLong": "1063644018276360192"
  },
  "Category": "B" 
}

{
  "_id": {
    "$oid": "5bef94591c4b3236e79f9cee" 
  },
  "Created_at": "Sat Nov 17 05:17:12 +0000 2018",
  "ID": {
    "$numberLong": "1063644700727481111"
  },
  "Category": "C" 
}

Duplicates are defined by their ID. I want to count the number of duplicates and print their category like this.

Category A : 5 (5 duplicates tagged Category A)

Category B : 6

Category C : 15

This is what I have tried but it doesn't print anything. I have already seeded my Mongo database with duplicates.

cursor = db.collection.aggregate([
    { 
        "$group": { 
            "_id": {"ID": "$ID"}, 
            "uniqueIds": { "$addToSet": "$_id" },
            "count": { "$sum": 1 } 
        }
    }, 
    { "$match": { "count": { "$gt": 1 } } }
])

for document in cursor:
    print(document)

Your help is appreciated :)

Joshua
  • 73
  • 2
  • 8
  • It should work. May be your count would not be greater(`$gt`) than 1? Try this `db.collection.aggregate([ { "$group": { "_id": "$ID", "uniqueIds": { "$addToSet": "$Category" }, "count": { "$sum": 1 } }} ])` – Ashh Nov 17 '18 at 11:57
  • Thanks for your help. I've tried your code but it still doesn't work. No errors either. It just prints nothing. – Joshua Nov 17 '18 at 12:04
  • I have added more documents. – Joshua Nov 17 '18 at 12:16
  • 1
    Take a look https://mongoplayground.net/p/WtwN32is1G9. Is it ok? – Ashh Nov 17 '18 at 12:18
  • Yes that looks good but I still can't print the output. I need to print the db.collection.aggregate – Joshua Nov 17 '18 at 12:19
  • Is it possible that, ID is same, but Category is different? – Arsen Davtyan Nov 17 '18 at 13:51

1 Answers1

0

Try this:

db.collection.aggregate([
{
    $group : {
                 "_id" : {"ID" : "$ID", "Category" : "$Category"}, 
                 "Count" : {$sum : 1}
             }
}, 
{
    $match : {
                 "Count" : {$gt : 1}
             }
}, 
{
    $project : {
                   "_id" : 0, 
                   "ID" : "$_id.ID", 
                   "Category" : "$_id.Category", 
                   "Count" : "$Count" 
                }
}
]);

Hope this helps!

Arsen Davtyan
  • 1,891
  • 8
  • 23
  • 40