3

I am trying to use a single gremlin query to determine the percentage of vertices that satisfy a certain predicate, but I'm having trouble storing and propagating the computed values.

Say I want to compute the percentage of all vertices with label "A" that have an outgoing edge with label "B". I can print out the number of vertices with label "A", a well as the number of vertices with an outgoing edge with label "B" in the same query:

g.V().limit(1).project("total","withEdgeB")
 .by(g.V().hasLabel("A").count())
 .by(g.V().hasLabel("A").match(__.as("a").outE("B").inV()).dedup().count())

This gives me the two relevant values: total and withEdgeB. How do I propagate and calculate with those values?

Ideally, I want something like this:

g.V().limit(1).project("total","withEdgeB","percentage")
 .by(g.V().hasLabel("A").count().as("totalA"))
 .by(g.V().hasLabel("A").match(__.as("a").outE("B").inV()).dedup().count().as("totalWithEdgeB"))
 .by(totalWithEdgeB / totalA)

So my question is, how can I access the values totalA and totalWithEdgeB in the third by() statement? Or am I going about this all wrong?

1 Answers1

1

I would use some simple calculations. Using the modern graph, find all person vertices with outgoing created edges:

gremlin> g.V().hasLabel('person').
           choose(outE('created'),
                    constant(1),
                    constant(0)).fold().
           project('total','withCreated','percentage').
             by(count(local)).
             by(sum(local)).
             by(mean(local))
==>[total:4,withCreated:3,percentage:0.75]
Daniel Kuppitz
  • 10,846
  • 1
  • 25
  • 34