0

Please may you help me to write a query that returns each source vertex in my traversal along with its associated edges and vertices as arrays on each such source vertex? In short, I need a result set comprising an array of 3-tuples with item 1 of each tuple being the source vertex and items 2 and 3 being the associated arrays.

Thanks!

EDIT 1: Expanded on the graph data and added my current problem query. EDIT 2: Improved Gremlin sample graph code (apologies, didn't think anyone would actually run it.)

Sample Graph

g.addV("blueprint").property("name","Mall").
addV("blueprint").property("name","HousingComplex").
addV("blueprint").property("name","Airfield").
addV("architect").property("name","Tom").
addV("architect").property("name","Jerry").
addV("architect").property("name","Sylvester").
addV("buildingCategory").property("name","Civil").
addV("buildingCategory").property("name","Commercial").
addV("buildingCategory").property("name","Industrial").
addV("buildingCategory").property("name","Military").
addV("buildingCategory").property("name","Resnameential").
V().has("name","Tom").addE("designed").to(V().has("name","HousingComplex")).
V().has("name","Tom").addE("assisted").to(V().has("name","Mall")).
V().has("name","Jerry").addE("designed").to(V().has("name","Airfield")).
V().has("name","Jerry").addE("assisted").to(V().has("name","HousingComplex")).
V().has("name","Sylvester").addE("designed").to(V().has("name","Mall")).
V().has("name","Sylvester").addE("assisted").to(V().has("name","Airfield")).
V().has("name","Sylvester").addE("assisted").to(V().has("name","HousingComplex")).
V().has("name","Mall").addE("classification").to(V().has("name","Commercial")).
V().has("name","HousingComplex").addE("classification").to(V().has("name","Resnameential")).
V().has("name","Airfield").addE("classification").to(V().has("name","Civil"))

Please note that the above is a very simplified rendering of our data.

Needed Query Results

I need to bring back each blueprint vertex as a base with each of its associated edges / vertices as arrays.

My Current Solution

Currently I do this very cumbersome query that gets the blueprints and assigns a label, gets the architects and assigns a label, then selects both labels. The solution is ok; however, it gets messy when I need to include edges or I need to get blueprint classification vertices (industrial, military, residential, commercial, etc.). In effect, the more associated data that I need to pull back for each blueprint, the sloppier my solution becomes.

My current query looks something like this:

g.V().hasLabel("blueprint").as("blueprints").
outE().or(hasLabel("designed"),hasLabel("assisted")).inV().as("architects").
select("blueprints").coalesce(out("classification"),constant()).as("classifications").
select("blueprints","architects","classifications")

The above produces a lot of duplication. If the number of: blueprints is b, architects is a, and classifications is c, the result set comprises b * a * c results. I'd like one blueprint with an array of its associated architects and an array of its associated classifications, if any.

Complications

I'm trying to do this in one query so that I can get all blueprint data from the graph to populate a filtered list. Once I have the list comprising all of the vertices, edges, and their properties, users can then click links to blobs, browse to project sites, etc. Accordingly, I've got pagination as well as filtering to think about and I'd prefer to make one trip to the server each time I get a new page or the filters change.

Beans
  • 1
  • 5
  • you may want to add more examples about your data so the query becomes more relevant, like you're refering to `industrial`, `military` as a type for the `blueprint` probably, so you may to extend it. You may provide also a query example to use, so other will not follow your path. – azbarcea Oct 24 '19 at 01:26
  • I've done as requested. – Beans Oct 24 '19 at 09:05
  • Please test your data load script. it is not a valid syntax. Most notable issues is the mixing of "id" as a property key with `T.id` which represents an actual graph element identifier. Also your `addE()` statements are not correct - the `from()` and `to()` modulators need to point to step labels or a vertex (in some cases you have the latter "correct" but because of the `T.id` issue I mentioned they will return no vertex). – stephen mallette Oct 24 '19 at 12:48
  • I've done as requested. – Beans Oct 24 '19 at 13:06
  • Not sure if it helps to add that the result I'm looking for is a set of 3-tuples where for each tuple: item 1 is the blueprint, item 2 is the blueprint's array of architects, and item 3 is the blueprint's array of classifications. – Beans Oct 24 '19 at 15:54

1 Answers1

0

I figured out an answer; however, it quadruples the compute charge for the query. Not sure if this can be optimized further.

g.V().hasLabel("blueprint").
project("blueprints","architects").
by().
by(outE().or(hasLabel("designed"),hasLabel("assisted")).inV().dedup().fold())

I just solved for blueprints and architects, but classifications just needs another by(...traversal...) and projection label.

I may have to just get the blueprints in one query, get each of their associated items in parallel queries, then put it all together in the API. That would be very bad design for the API data layer but may be necessary for performance reasons.

Beans
  • 1
  • 5
  • this `outE().or(hasLabel("designed"),hasLabel("assisted")).inV()` can be simplified to `out('designed','assisted')` - perhaps your graph database isn't optimizing that `or()` properly – stephen mallette Oct 24 '19 at 21:02
  • Thanks for the suggestion; however, it only reduces the processing by 0.4%. – Beans Oct 24 '19 at 21:21
  • not sure what graph database you're using but most graphs don't optimize the lookup of vertices just by label. most graphs will end up doing a full graph scan to find them. typically traversals will start from a vertex or small subset of vertices identified by an index. if you're doing the full graph scan and there's lots of "blueprint" vertices to be found for which you then need to traverse a lot of edges to collect all the data you want, i think i could see why it might be "slow". – stephen mallette Oct 24 '19 at 21:44
  • As mentioned in the post, I'm using a simplified version of the data, which necessarily means a simplified version of the query. Quadruple the compute to do the project(). – Beans Oct 24 '19 at 22:00