Neo4j SDN 4 GraphId performance vs Index

Question

At my Neo4j/SDN 4 application all of my Cypher queries are based on internal Neo4j IDs.

This is an issue because I can't rely on these IDs at my web application urls. Neo4j can reuse these IDs so there is a good chance that at some time in future under the same ID we can found absolutely another node.

I tried to re-implement this logic based on the following solution: Using the graph to control unique id generation but noticed a query performance degradation.

From a theoretical point of view, should a Cypher query based on the property with @Index(unique = true, primary = true)

for example:

@Index(unique = true, primary = true)
private Long uid;

entity.uid = {someId}

work with the same performance as a Cypher query which is based on internal Neo4j ID:

id(entity) = {someId}

UPDATED

This is :schema output:

Indexes
   ON :BaseEntity(uid) ONLINE
   ON :Characteristic(lowerName) ONLINE
   ON :CharacteristicGroup(lowerName) ONLINE
   ON :Criterion(lowerName) ONLINE
   ON :CriterionGroup(lowerName) ONLINE
   ON :Decision(lowerName) ONLINE
   ON :FlagType(name) ONLINE (for uniqueness constraint)
   ON :HAS_VALUE_ON(value) ONLINE
   ON :HistoryValue(originalValue) ONLINE
   ON :Permission(code) ONLINE (for uniqueness constraint)
   ON :Role(name) ONLINE (for uniqueness constraint)
   ON :User(email) ONLINE (for uniqueness constraint)
   ON :User(username) ONLINE (for uniqueness constraint)
   ON :Value(value) ONLINE

Constraints
   ON ( flagtype:FlagType ) ASSERT flagtype.name IS UNIQUE
   ON ( permission:Permission ) ASSERT permission.code IS UNIQUE
   ON ( role:Role ) ASSERT role.name IS UNIQUE
   ON ( user:User ) ASSERT user.email IS UNIQUE
   ON ( user:User ) ASSERT user.username IS UNIQUE

As you can see I have an index on :BaseEntity(uid)

BaseEntity is a base class in my entity hierarchy, for example:

@NodeEntity
public abstract class BaseEntity {

    @GraphId
    private Long id;

    @Index(unique = false)
    private Long uid;

    private Date createDate;

    private Date updateDate;

...

}

@NodeEntity
public class Commentable extends BaseEntity {
...
}

@NodeEntity
public class Decision extends Commentable {

    private String name;

}

Will this uid index be used when I'm looking for example for (d:Decision) WHERE d.uid = {uid} ?

PROFILE resuls - internal ID vs indexed property

Query based on internal ID

PROFILE MATCH (parentD)-[:CONTAINS]->(childD:Decision) 
WHERE id(parentD) = 1474333 
MATCH (childD)-[relationshipValueRel1475199:HAS_VALUE_ON]-(filterCharacteristic1475199) 
WHERE id(filterCharacteristic1475199) = 1475199 
WITH relationshipValueRel1475199, childD 
WHERE  ([1, 19][0] <= relationshipValueRel1475199.value <=  [1, 19][1] )  
WITH childD  
MATCH (childD)-[relationshipValueRel1474358:HAS_VALUE_ON]-(filterCharacteristic1474358) 
WHERE id(filterCharacteristic1474358) = 1474358 
WITH relationshipValueRel1474358, childD 
WHERE  (ANY (id IN ['Compact'] WHERE id IN relationshipValueRel1474358.value ))  
WITH childD  
MATCH (childD)-[relationshipValueRel1475193:HAS_VALUE_ON]-(filterCharacteristic1475193) 
WHERE id(filterCharacteristic1475193) = 1475193 
WITH relationshipValueRel1475193, childD 
WHERE  (ANY (id IN ['16:9', '3:2', '4:3', '1:1'] 
WHERE id IN relationshipValueRel1475193.value ))  
WITH childD  
OPTIONAL MATCH (childD)-[vg:HAS_VOTE_ON]->(c) 
WHERE id(c) IN [1474342, 1474343, 1474340, 1474339, 1474336, 1474352, 1474353, 1474350, 1474351, 1474348, 1474346, 1474344] 
WITH childD, vg.avgVotesWeight as weight, vg.totalVotes as totalVotes 
WITH * MATCH (childD)-[ru:CREATED_BY]->(u:User)  
WITH ru, u, childD , toFloat(sum(weight)) as weight, toInt(sum(totalVotes)) as totalVotes  
ORDER BY  weight DESC 
SKIP 0 LIMIT 10 
RETURN ru, u, childD AS decision, weight, totalVotes, 
[ (parentD)<-[:DEFINED_BY]-(entity)<-[:COMMENTED_ON]-(comg:CommentGroup)-[:COMMENTED_FOR]->(childD) | {entityId: id(entity),  types: labels(entity), totalComments: toInt(comg.totalComments)} ] AS commentGroups, 
[ (parentD)<-[:DEFINED_BY]-(c1)<-[vg1:HAS_VOTE_ON]-(childD) | {criterionId: id(c1),  weight: vg1.avgVotesWeight, totalVotes: toInt(vg1.totalVotes)} ] AS weightedCriteria, 
[ (parentD)<-[:DEFINED_BY]-(ch1:Characteristic)<-[v1:HAS_VALUE_ON]-(childD)  WHERE NOT ((ch1)<-[:DEPENDS_ON]-())  | {characteristicId: id(ch1),  value: v1.value, totalHistoryValues: toInt(v1.totalHistoryValues), description: v1.description, valueType: ch1.valueType, visualMode: ch1.visualMode} ] AS valuedCharacteristics

PROFILE output:

Cypher version: CYPHER 3.1, planner: COST, runtime: INTERPRETED. 350554 total db hits in 238 ms.

Query based on indexed property uid

PROFILE MATCH (parentD)-[:CONTAINS]->(childD:Decision) 
WHERE parentD.uid = 61 
MATCH (childD)-[relationshipValueRel1475199:HAS_VALUE_ON]-(filterCharacteristic1475199) 
WHERE filterCharacteristic1475199.uid = 15 
WITH relationshipValueRel1475199, childD 
WHERE  ([1, 19][0] <= relationshipValueRel1475199.value <=  [1, 19][1] )  
WITH childD  
MATCH (childD)-[relationshipValueRel1474358:HAS_VALUE_ON]-(filterCharacteristic1474358) 
WHERE filterCharacteristic1474358.uid = 10 
WITH relationshipValueRel1474358, childD 
WHERE  (ANY (id IN ['Compact'] WHERE id IN relationshipValueRel1474358.value ))  
WITH childD  
MATCH (childD)-[relationshipValueRel1475193:HAS_VALUE_ON]-(filterCharacteristic1475193) 
WHERE filterCharacteristic1475193.uid = 14 
WITH relationshipValueRel1475193, childD 
WHERE  (ANY (id IN ['16:9', '3:2', '4:3', '1:1'] 
WHERE id IN relationshipValueRel1475193.value ))  
WITH childD  
OPTIONAL MATCH (childD)-[vg:HAS_VOTE_ON]->(c) 
WHERE c.uid IN [26, 27, 24, 23, 20, 36, 37, 34, 35, 32, 30, 28] 
WITH childD, vg.avgVotesWeight as weight, vg.totalVotes as totalVotes 
WITH * MATCH (childD)-[ru:CREATED_BY]->(u:User)  
WITH ru, u, childD , toFloat(sum(weight)) as weight, toInt(sum(totalVotes)) as totalVotes  
ORDER BY  weight DESC 
SKIP 0 LIMIT 10 
RETURN ru, u, childD AS decision, weight, totalVotes, 
[ (parentD)<-[:DEFINED_BY]-(entity)<-[:COMMENTED_ON]-(comg:CommentGroup)-[:COMMENTED_FOR]->(childD) | {entityId: id(entity),  types: labels(entity), totalComments: toInt(comg.totalComments)} ] AS commentGroups, 
[ (parentD)<-[:DEFINED_BY]-(c1)<-[vg1:HAS_VOTE_ON]-(childD) | {criterionId: id(c1),  weight: vg1.avgVotesWeight, totalVotes: toInt(vg1.totalVotes)} ] AS weightedCriteria, 
[ (parentD)<-[:DEFINED_BY]-(ch1:Characteristic)<-[v1:HAS_VALUE_ON]-(childD)  WHERE NOT ((ch1)<-[:DEPENDS_ON]-())  | {characteristicId: id(ch1),  value: v1.value, totalHistoryValues: toInt(v1.totalHistoryValues), description: v1.description, valueType: ch1.valueType, visualMode: ch1.visualMode} ] AS valuedCharacteristics

Cypher version: CYPHER 3.1, planner: COST, runtime: INTERPRETED. 671326 total db hits in 426 ms.

Is there any chance to improve the performance based on uid ?

František Hartman · Accepted Answer · 2017-05-27T10:02:29.873

5

You are right not to use Neo4j internal ids in web urls, as they can be reused after node is deleted etc..

From performance point of view the internal id is as fast as you can get - it is actually an offset in file with node/relationship records (you could have noticed these are 2 separate id sequences, you can have node with id=z and relationship with same id=x).

Any use of an index has to be slower, because the database does index lookup first, gets the internal id and then reads the node record.

However for vast majority of the applications the difference in performance is negligible - will be likely an much smaller than network latency or general OGM overhead.

If you see a noticeable difference

verify the indexes exists in the database (e.g :schema in Neo4j browser)
turn on logging and verify your queries have the correct label (set info level for org.neo4j.ogm)
if the index exists and query contains right label then use PROFILE to check the query plan

UPDATED

Yes, index will be used for queries like:

MATCH (d:Decision) WHERE d.uid = {uid} ...

which should get generated by

session.load(Decision.class, uid)

if your index is primary or findByUid on DecisionRepository.

Beware that the index might not be used when the where clause appears in the middle of the query:

...
WITH x
MATCH (x)-[...]-(d) WHERE d.uid = {uid} ...

This depends on the query plan and you should use PROFILE to investigate this.

edited May 27 '17 at 10:02

answered May 27 '17 at 06:42

František Hartman

14,436
2
40
60

Thanks for your answer. Right now I'm trying to come with an approach how to refactor my system in order to avoid the issue with ID reusing and I see the following schema - in my web urls I'll use surrogate uid. If id is not required to be placed in web url I'll use internal Neo4j id.So the surrogate uuid will be used only in web urls otherwise in all other places on client I'm going to use internal Neo4j ID. Does it make sense ? – alexanoid May 27 '17 at 08:40
Having 2 ways to access entities by id might unnecessarily complicate things. I would go with custom uuid only. As I said index is *fast*, difference between internal id and index lookup will be order of magnitude smaller than network latency or general OGM overhead. – František Hartman May 27 '17 at 09:38
I have a very intencive usage of different Ids in a single query (https://stackoverflow.com/questions/43824894/neo4j-cypher-query-structure-and-performance-optimization) so the performance degradation with an approach based on a pure UIDs was noticeable by the human eye.. – alexanoid May 27 '17 at 09:42
I have updated my question and provided and output of :schema command and my SDN entities hierarchy. Could you please take a look ? – alexanoid May 27 '17 at 09:54
I have added PROFILE info for internal id vs indexed uid - the difference is 238 vs 426ms. Is there any chance to improve the performance based on uid ? – alexanoid May 27 '17 at 14:17
I have found the issue of bad performance on the Indexed id - previously I have created the id field in the base class. This way index doesn't work(in the Cypher query) on the derived child classes in the hierarchy. I have moved the index id filed to the concrete classes and now everything work as expected. I'm also really surprised because PROFILE shows the better performance now on the index id vs native Graph ID. – alexanoid Jun 18 '17 at 18:34

Neo4j SDN 4 GraphId performance vs Index

1 Answers1