4

It's very strange that I did not find answer in documentation and here for a very simple question. How to find duplicated records in collections. For example I need to find duplicated by id for next documents:

{"id": 1, name: "Mike"},
{"id": 2, name: "Jow"},
{"id": 3, name: "Piter"},
{"id": 1, name: "Robert"}

I need to query that will return two documents with same id (id: 1 in my case).

Dmitry Bubnenkov
  • 9,415
  • 19
  • 85
  • 145
  • Can you elaborate a bit more? What should be the result of the query? Just the duplicate id? The complete documents that contain the duplicate ids? – mpoeter Jul 01 '20 at 08:32

1 Answers1

4

Have a look at the COLLECT AQL command, it can return the count of documents that contain duplicate values, such as your id key.

ArangoDB AQL - COLLECT

You can use LET a lot in AQL to help break down a query into smaller steps, and work with the output in future queries.

It may be possible to also collapse it all into one query, but this technique helps break it down.

LET duplicates = (
    FOR d IN myCollection
    COLLECT id = d.id WITH COUNT INTO count
    FILTER count > 1
    RETURN {
        id: id,
        count: count
    }
)

FOR d IN duplicates
FOR m IN myCollection
FILTER d.id == m.id
RETURN m

This will return:

[
  {
    "_key": "416140",
    "_id": "myCollection/416140",
    "_rev": "_au4sAfS--_",
    "id": 1,
    "name": "Mike"
  },
  {
    "_key": "416176",
    "_id": "myCollection/416176",
    "_rev": "_au4sici--_",
    "id": 1,
    "name": "Robert"
  }
]
David Thomas
  • 2,264
  • 2
  • 18
  • 20