1

Using Cosmos DB Gremlin API, I’m trying to create a gremlin query that summarizes edges by vertex labels by counts

The closest thing I can come up with doesn’t do the counting just deduping. Any help would be greatly appreciated

g.E().project('edge','in','out').
by(label()).
by(inV().label()).
by(outV().label()).dedup()

output

[
  {
    "edge": "uses",
    "in": "software-system",
    "out": "person"
  },
  {
    "edge": "runs on",
    "in": "container",
    "out": "software-system"
  },
  {
    "edge": "requires",
    "in": "component",
    "out": "container"
  },
  {
    "edge": "embeds",
    "in": "code",
    "out": "component"
  }
]

ideally output

[
  {
    "edge": "uses",
    "in": "software-system",
    "out": "person",
    "count": 105
  },
  {
    "edge": "runs on",
    "in": "container",
    "out": "software-system",
    "count": 22
  },
  {
    "edge": "requires",
    "in": "component",
    "out": "container",
    "count": 15
  },
  {
    "edge": "embeds",
    "in": "code",
    "out": "component",
    "count": 6
  }
]

1 Answers1

2

I think I would approach it this way with a combination of groupCount() and project():

gremlin> g.E().groupCount().
......1>         by(project('edge','in','out').
......2>              by(label).
......3>              by(inV().label()).
......4>              by(outV().label())).
......5>   unfold()
==>{edge=created, in=software, out=person}=4
==>{edge=knows, in=person, out=person}=2

If your graph database can't support keys as maps then you might need to transform it further:

gremlin> g.E().groupCount().
......1>         by(project('edge','in','out').
......2>              by(label).
......3>              by(inV().label()).
......4>              by(outV().label())).
......5>   unfold().
......6>   map(union(select(keys), select(values)).fold())
==>[[edge:created,in:software,out:person],4]
==>[[edge:knows,in:person,out:person],2]
stephen mallette
  • 45,298
  • 5
  • 67
  • 135
  • In Cosmos, Failure in submitting query: g.E().groupCount().by(project('edge','in','out').by(label).by(inV().label()).by(outV().label())): Server serialization error: ActivityId : a61f4416-f935-4d97-9855-d0ad6bd4c543 ExceptionType : GraphSerializeException ExceptionMessage : Gremlin Serialization Error: GraphSON V1_0 serializer cannot serialize object of type: MapField to a primitive value to perform the desired Gremlin step. GremlinRequestId : 062a147b-b6e9-4994-9164-d08ee0076cd3 Context : graphcompute Scope : graphcomp-execquery GraphInterOpStatusCode : SerializationError HResult : 0x80131500 – Νικόλαος Μανωλακος Feb 15 '22 at 14:09
  • added a possible way for this query to work on cosmosdb. – stephen mallette Feb 15 '22 at 14:44
  • [ [ { "edge": "uses", "in": "software-system", "out": "person" }, 204 ], [ { "edge": "runs on", "in": "container", "out": "software-system" }, 221 ], [ { "edge": "requires", "in": "component", "out": "container" }, 210 ], [ { "edge": "embeds", "in": "code", "out": "component" }, 213 ] ] – Νικόλαος Μανωλακος Feb 16 '22 at 18:40