How best to model data in AWS Neptune

Question

I'm currently evaluating AWS Neptune as a potential graph database (specifically comparing it to Azure Cosmos Graph DB). The scenario is that I have a bunch of test data which I'm adding with the bulk loader and will then run some benchmarking tests on the DB performance.

I'm curious about how best to model the data in AWS Neptune.

In Azure Cosmos Graph DB, edges are unidirectional and are stored on the source vertex. So queries which need to look for inbound edges will be slow unless an edge is also stored on the other vertex.

So far in AWS Neptune, I've not found an answer on how the edges can be best optimised in a similar way.

Reading this description of the Neptune internal data model (https://docs.aws.amazon.com/neptune/latest/userguide/feature-overview-data-model.html) suggests that there's a common storage pattern used for both vertices, edges and properties which are indexed using the 3 most common access patterns.

So I would assume based on this that:

we need to store both incoming and outgoing edges, or
we need to enable "OSGP Index Creation Using Lab Mode" to index from both directions

What is the best approach here?

If you are looking for 'opinions', you might get a better response at: https://www.reddit.com/r/aws — John Rotenstein, Dec 08 '20 at 10:31

score 1 · Answer 1 · answered Dec 08 '20 at 13:33

1

In general when traversing edges in Gremlin it is a good practice to specify the edge labels that you are interested in. This helps a query engine discard other edges from consideration more easily. This is especially true for any steps that require looking at incoming edges such as in, inE, both, bothE.

Specific to Amazon Neptune, so long as you are able to provide one or more labels to those steps, the OSGP index should not be necessary.

answered Dec 08 '20 at 13:33

Kelvin Lawrence

14,674
2
16
38

Yes, but is it necessary to create edges in both directions between vertices for this to work efficiently? – infl3x Dec 09 '20 at 06:20
1

So long as you are able to provide edge labels in your Gremlin queries they should be efficient regardless of edge direction. If you are unable to provide labels then outgoing edge traversals will be more efficient. – Kelvin Lawrence Dec 09 '20 at 12:50

How best to model data in AWS Neptune

1 Answers1