I am creating an AWS Neptune graph that will eventually have billions of nodes and edges. With this kind of data volume, I was wondering if there are some best practices when creating the schema to optimize for queries. One thing in particular that I was curious about is whether there is a major performance difference when querying by property vs. ID:
g.V().has('application', 'applicationId', 'application_123')...
vs.
g.V('application_123')...
I would assume starting a query with ID in a graph with billions of nodes and edges would be substantially faster. I was wondering if anyone had any experience with this. If this is the case I could give my nodes IDs that I know at query time that way I can always query by ID. For instance, application nodes would have IDs like application_123
and phone nodes would have IDs like phone_1234567890
where (123) 456 7890 is the phone number. Would this improve query performance? Anything else I can do to improve query performance on a graph with billions of nodes and edges?