2

Having trouble coming up with a good single query to get the results I'm looking for.

In this scenario I have reports, stores, and organizations.

Reports can only ever belong to one store, or one org. Never both. Stores always belong to an org, and can only ever belong to one org. Orgs can only ever be a sub-org of one org, but can be in a chain of multiple sub-orgs.

I'm looking for a single query that will take a report Id and give me the single top level org, regardless of how many levels of sub orgs there are.

Scenarios:

  1. report > store > org
  2. report > store > org > org > org
  3. report > org
  4. report > org > org > org

Current edge names are simply the in and out label names reportStore & storeReport, reportOrg & orgReport, storeOrg & orgStore, suborgOrg & orgSuborg

The furthest I've gotten so far is g.V('<id>').until(has('label','org')).repeat(out()).limit(1)

but clearly this is not a direct shot, and it will stop as soon as it gets to the first org.

Steve Eggering
  • 759
  • 2
  • 9
  • 23

1 Answers1

1

When asking questions about Gremlin it is always best to include a Gremlin script that creates some sample data - like this one:

g.addV('report').property('name','report-a').as('a').
  addV('report').property('name','report-b').as('b').
  addV('store').property('name','store').as('s').
  addV('org').property('name','org-z').as('z').
  addV('org').property('name','org-y').as('y').
  addV('org').property('name','org-x').as('x').
  addV('org').property('name','org-w').as('w').
  addE('link').from('a').to('s').
  addE('link').from('s').to('z').
  addE('link').from('z').to('y').
  addE('link').from('y').to('x').
  addE('link').from('b').to('w').iterate()

In this data above, I gather that for "report-a" you'd want to return "org-x" and for "report-b" you'd want to return "org-w" (i.e. from a leaf in the tree traverse up to the top most vertex). Your edge labels didn't seem to have any bearing on the query so I omitted them from the sample for simplicity sake.

You were correct to use repeat() but as you mentioned using until() has the potential to kill your loop too early. In this case, given the data structure you have, you can allow the loop to self-terminate - it will simply stop iterating when it reaches the last "org". The important part is to emit() that last vertex which you can detect by looking for a vertex with no outgoing edges which in Gremlin is: __.not(outE()). Your working query is thus:

gremlin> g.V().has('report','name','report-a').
......1>   repeat(out()).
......2>     emit(__.not(outE())).
......3>   values('name')
==>org-x
gremlin> g.V().has('report','name','report-b').
......1>   repeat(out()).
......2>     emit(__.not(outE())).
......3>   values('name')
==>org-w
stephen mallette
  • 45,298
  • 5
  • 67
  • 135
  • Thank you for this thorough answer. Part of my issue seems to stem from Azure Cosmos DB not fully supporting gremlin queries, most notably `_`. Out of curiosity, what is the purpose of the iterate command at the end of your list of adds? – Steve Eggering Jan 29 '19 at 14:15
  • `__` is supported. it's just the name of the class that spawns anonymous traversals. it can be left off in some cases like `out()` inside `repeat()`. I used it with `not()` however to be explicit in showing it was an anonymous traversal and not the predicate `P.not`. the `iterate()` command is used to iterate the traversal without returning any results. Typically, it is used to generate side-effects which in this case would be the mutations to the graph. – stephen mallette Jan 29 '19 at 14:24
  • I just realized this was a double underscore, which does seem to be working. However, this query seems to be stuck in a loop as well. All organizations have other out edges besides reports and stores, which i think makes this query not work for my case. – Steve Eggering Jan 29 '19 at 14:35
  • well, that's where additional filters will come in on `outE()`. include edge labels there which you want to be considered as part of the `emit()` – stephen mallette Jan 29 '19 at 16:38