3

Say I have the following documents:

{"_key": "1", "name": "George Washington"}
{"_key": "2", "name": "George Washington"}
{"_key": "3", "name": "John Adams"}
{"_key": "4", "name": "Thomas Jefferson"}
{"_key": "5", "name": "George Washington"}
{"_key": "6", "name": "Thomas Jefferson"}

I want to write an AQL statement that returns the keys of the document grouped by name, but only if the name occurs more than once.

So my desired output is:

[["1", "2", "5"], ["4", "6"]] 

So far I have come up with

FOR doc IN documents
    LET key = doc._key
    COLLECT name = doc.name INTO groups KEEP key
    RETURN (FOR g IN groups RETURN g["key"])

This returns:

[["1", "2", "5"], ["3"], ["4", "6"]]

How can I modify the AQL command to only get arrays with two or more entries?

Elias Strehle
  • 1,722
  • 1
  • 21
  • 34

2 Answers2

1

Another possibility (potentially a bit more efficient as no subquery is involved):

FOR doc IN documents
  LET key = doc._key     
  COLLECT name = doc.name INTO groups KEEP key 
  LET keys = groups[*].key 
  FILTER LENGTH(keys) > 1 
  RETURN keys
stj
  • 9,037
  • 19
  • 33
  • Thanks, this works as well. I tested both queries (with an added LIMIT 1000) on my original data set, which has about 4.5 million documents. My query took 70 seconds, yours 60! – Elias Strehle Oct 02 '17 at 10:47
0

Solved it:

FOR doc IN documents
    LET key = doc._key
    COLLECT name = doc.name INTO groups KEEP key
    LET groups2 = (FOR group IN groups RETURN group["key"])
    FILTER LENGTH(groups2) >= 2
    RETURN groups2
Elias Strehle
  • 1,722
  • 1
  • 21
  • 34