0

We have a requirement for identifying (and avoid) duplicate records, which are contained inside of an Array of a parent document, ie:

{
  _id: 1,
  item: "abc",
  stock: [
    { size: "S", color: "red", quantity: 25 },
    { size: "S", color: "blue", quantity: 10 },
    { size: "M", color: "blue", quantity: 50 }
  ]
}
{
  _id: 2,
  item: "def",
  stock: [
    { size: "S", color: "blue", quantity: 25 },
    { size: "M", color: "blue", quantity: 5 },
    { size: "M", color: "black", quantity: 10 },
    { size: "L", color: "red", quantity: 2 }
  ]
}
{
  _id: 3,
  item: "ijk",
  stock: [
    { size: "S", color: "red", quantity: 25 },
    { size: "S", color: "blue", quantity: 10 },
    { size: "M", color: "blue", quantity: 50 }
  ]
}

In this example, items with _id: 1 and _id:3 are duplicate since all the elements of the array are exaclty the same

We have been trying to use the following reference but it does not take in consideration all the elements of the array as a whole but just a single property such as the color:

https://docs.mongodb.com/manual/core/index-multikey/#multikey-embedded-documents

If you have any other suggestion or workaround that could help us, we would very much appreciate it :)

Neoluis10
  • 1
  • 1
  • 4
  • Since u need to remove duplicates, which item (drf or ijk) should be in the output? – varman Aug 18 '20 at 03:36
  • You might define a hash function for the array that sorts the elements and object fields by some predefined ruleset, and computes a consistent hash of the array. Then store that value in a field with a unique index. – Joe Aug 18 '20 at 04:15

0 Answers0