gremlin query to control return result format

Question

We can get all the data we want from a query with gremlin.

For example,

g.V(1234)
  .project("identifier", "associations")
  .by(valueMap(true)).by(unfold())
  .by(bothE().local(elementMap()).fold())

This query takes a long time to execute because vertex 1234 has 10k+ edges.

After some experimentation, we see limiting the amount of data coming back; the result can return faster. I cannot get it to work altogether, though.

Instead of showing you what I've created so far, I'd like to frame my question about what I want.

Given that I have a vertex with 10k+ edges, how can I minimize the amount of data returned in the result payload, so the query returns faster?

An example payload I was thinking of would be a hash map of:

{
  // edge direction
  "IN": {
    // edge label
    "MEMBER": [
      "4567" // vertex identifier going to (this would not be 1234 as we know that already)
    ]
  },
  // edge direction
  "OUT": {
    // edge label
    "BICYCLE": [
      "7890" // vertex identifier going to (this would not be 1234 as we know that already)
    ]
  }
}

When I started to approach the above, my query started to have group().by(...). I was unable to resolve how to get the by(...) values to only be a single key, rather than the whole edge representation, though.

For example,

g
 .V("1234")
 .outE()
 .limit(10)
 .local(union(
     label(),
     inV().id()
 ).fold())
 .group().by(0)

I have scoured docs (book and tutorial). They are useful, but I think I'm getting stuck trying to ask for something with the wrong vocabulary or not possible to do.

Open to different approaches. This is just how I was thinking about it. If there are more efficient ways to get data from here.

Is the main issue, in essence, that you want to return mainly the labels and IDs and not any of the properties? `elementMap` will give you pretty much what you need but will also include any properties on an edge. I don't think using `group` should be needed, but if you use `group` what do you want the key to be? — Kelvin Lawrence, Jun 01 '22 at 13:27
I'll add an answer below that at least addresses the `group` question. — Kelvin Lawrence, Jun 01 '22 at 13:38
@KelvinLawrence, in my example payload (looks like JSON), I tried to document the keys and the value the key represents. `elementMap` returns more information than needed. I just require the label of the edge and the inV/outV identifier. — jtarchie, Jun 01 '22 at 13:39

Kelvin Lawrence · Answer 1 · 2022-06-01T15:20:34.807

While, I think elementMap and returning less properties to reduce payload size, is a better way to go (probably), here is an example that uses group that demonstrates one way of building such a query. I used the air-routes data for this test.

gremlin> g.V("1234").
......1>   group().
......2>     by(id).
......3>     by(local(outE().limit(10).union(label(), inV().id()).fold()))

==>[1234:[route,68,route,76,route,77,route,1233]]

UPDATED based on discussion in comments

There are of course, many ways that this query could be constructed. Using group or project are both viable alternatives, but they have the disadvantage that they need to aggregate more data server side before any results can start streaming to the client. It may be more effective to just use elementMap as shown below. You will get the starting vertex ID as part of the results but this may be a worthwhile tradeoff. Testing against your data set will reveal if this approach performs better. You would then just need to process these results in your application.

gremlin> g.V("1234").
......1>   bothE().
......2>   limit(10).
......3>   elementMap() 

==>[id:45033,label:route,IN:[id:68,label:airport],OUT:[id:1234,label:airport],dist:334]
==>[id:45034,label:route,IN:[id:76,label:airport],OUT:[id:1234,label:airport],dist:102]
==>[id:45035,label:route,IN:[id:77,label:airport],OUT:[id:1234,label:airport],dist:135]
==>[id:45036,label:route,IN:[id:1233,label:airport],OUT:[id:1234,label:airport],dist:584]
==>[id:59016,label:contains,IN:[id:1234,label:airport],OUT:[id:3741,label:continent]]
==>[id:55513,label:contains,IN:[id:1234,label:airport],OUT:[id:3710,label:country]]
==>[id:12370,label:route,IN:[id:1234,label:airport],OUT:[id:68,label:airport],dist:334]
==>[id:13829,label:route,IN:[id:1234,label:airport],OUT:[id:76,label:airport],dist:102]
==>[id:45030,label:route,IN:[id:1234,label:airport],OUT:[id:1233,label:airport],dist:584]
==>[id:13963,label:route,IN:[id:1234,label:airport],OUT:[id:77,label:airport],dist:135]

This will still yield the 10K results (without a limit step), but the results can start streaming right away rather than needing to be consolidated on the server first.

Thank you for the query. As I hoped I had explained above, `group` was an example attempt, not the answer. I started the way I was hoping to have the data look and I thought `group` was a means to that. If there are more efficient solutions, I'm happy to learn and discuss them. Please don't focus on my solution, but my question. — jtarchie, Jun 01 '22 at 13:52
In your initial query that uses `valueMap` and `project` , it is quite likely the fetching of all the vertex properties that is slowing down the query. The fastest way to get results might well be to use `elementMap` and rather than using `group` or `project` just let the results start streaming back to the client immediately. — Kelvin Lawrence, Jun 01 '22 at 15:15

gremlin query to control return result format

1 Answers1