I read Query to retrieve all paths traversable from a given vertex which describes how to find all paths from a node using gremlin. I'm trying to understand what are reasonable expectations for the performance of this query on a real-life dataset in AWS Neptune.
I've limited which edge labels are being queried by passing labels to bothE
. However, I see performance degrade rapidly after 5 depth or so (I believe depending on the branching factor of the graph).
I'm mainly trying to understand what reasonable expectations are of neptune. The property graph has around ~750M nodes, ~1.5B edges, and ~1.5B properties, and is fairly interconnected. The instance type is a db.r5.4xlarge.
Thanks for any help!
Example Query:
g.V('mynode').repeat(bothE('lbl1', 'lbl2', 'lbl3').otherV().simplePath()).until(__.not(bothE('lbl1', 'lbl2', 'lbl3').simplePath()).or().loops().is(eq(5))).path().count()
Example Profile is at https://pastebin.com/M6r4Xr54.
I've been profiling queries with the profile endpoint. It's been difficult for me to understand why performance is degrading, other than just the volume of data being returned, but the volume isn't that great at 5 depth)