3

I'm trying to generate facets (aggregation counts) for the following documents in a graph (based on collections rather than a named graph):

{
  "relation": "decreases",
  "edge_type": "primary",
  "subject_lbl": "act(p(HGNC:AKT1), ma(DEFAULT:kin))",
  "object_lbl": "act(p(HGNC:CDKN1B), ma(DEFAULT:act))",
  "annotations": [
    {
      "type": "Disease",
      "label": "cancer",
      "id": "cancer"
    },
    {
      "type": "Anatomy",
      "label": "liver",
      "id": "liver"
    }
  ]
}

The following works great to get facets (aggregation counts) for the edge_type:

FOR doc in edges
COLLECT 
    edge_type = doc.edge_type WITH COUNT INTO edge_type_cnt
RETURN {edge_type, edge_type_cnt}

I tried the following to get counts for the annotations[*].type value:

FOR doc in edges
COLLECT 
    edge_type = doc.edge_type WITH COUNT INTO edge_type_cnt,
    annotations = doc.annotations[*].type WITH COUNT INTO anno_cnt
RETURN {edge_type, edge_type_cnt, annotations, anno_cnt}

Which results in an error - any ideas what I'm doing wrong? Thanks!

William
  • 705
  • 1
  • 6
  • 17
  • `WITH COUNT INTO` can only appear once in a COLLECT operation. It is possible to aggregate multiple values with `COLLECT ... AGGREGATE`, but it doesn't make much sense to build a simple sum twice or more, because it will be the same value as the grouping criteria are the same.` – CodeManX Feb 06 '18 at 09:33

1 Answers1

4

Thanks to this thread: https://groups.google.com/forum/#!topic/arangodb/vNFNVrYo9Yo linked to from this Question: ArangoDB Faceted Search Performance pointed me in the right direction.

FOR doc in edges
    FOR anno in doc.annotations
    COLLECT anno_type = anno.type WITH COUNT INTO anno_cnt
RETURN {anno_type, anno_cnt}

Results in:

Anatomy 4275
Cell  2183
CellLine  2093
CellStructure 2081
Disease 2126
Organism  2075
TextLocation  2121

Looping over the edges and then the annotations array is the key that I was missing.

William
  • 705
  • 1
  • 6
  • 17