0

I am currently using SpringDataNeo4j–4.1.0.M1 for a data set containing about 1.2 million nodes and 12 million relationships. The data structure behind the graph is very complex and hierarchical. In total there are 79 NodeEntities and some of them can contain more than 20 @Relationship attributes.

Below an example of the hierarchy

enter image description here

An example of my domain

@NodeEntity
public abstract class DatabaseObject implements java.io.Serializable {

    @GraphId
    private Long id;
    private Long dbId;
    private String stableIdentifier;
    private String displayName;

    @Relationship(type = "created")
    private InstanceEdit created;
    @Relationship(type = "modified")
    private List<InstanceEdit> modified
    ...

@NodeEntity
public abstract class Event extends DatabaseObject {

    private String definition;
    private List<String> names;
    private Boolean isInferred;

    @Relationship(type = "authored", direction = "OUTGOING")
    private List<InstanceEdit> authored;
    @Relationship(type = "precedingEvent")
    private List<Event> precedingEvent;
    @Relationship(type = "literatureReference")
    private List<Publication> literatureReference;
    @Relationship(type = "regulatedBy")
    private List<Regulation> regulatedBy;
    ...

@NodeEntity
public class ReactionLikeEvent extends Event {

    private Boolean isChimeric; 
    private String systematicName;
    @Relationship(type = "input")
    private List<Input> input;
    @Relationship(type = "output")
    private List<Output> output;
    ...

An example of my Repositories

@Repository
public interface DatabaseObjectRepository extends GraphRepository<DatabaseObject>{
    DatabaseObject findByDbId(Long dbId);
    ...

While queries (to be specific queries for objects that have many relationships) to the Neo4j-Restful-Service or to the Remote-Web-Admin perform as expected (10-100ms max) the query performance drops drastically when retrieving entries using SDN (100 - 500ms). This performance drops only occur when setting to query depth to 1. If query depth is 0 and no relationships are returned then the response is fast. Indexes are created and response times do not change when performing a query with the neo4j native id.

For other use cases (for example specific queries for smaller objects, @QueryResult objects or Collections of objects) SDN performs nicely. My problem is specific to queries that retrieve objects with many relationships or queries with increased depth (more than one). Is the bad performance a result of the complex domain hierarchy and too rich NodeEntites, do I need to reduce my hierarchy to achieve a better performance?

Thanks for your help

Luanne
  • 19,145
  • 1
  • 39
  • 51
fkorn
  • 50
  • 6
  • Do you have a dataset/code that we could use to test this? Please open an issue at jira.spring.io/browse/DATAGRAPH with any data you're able to supply – Luanne Apr 08 '16 at 03:21
  • @Luanne I can share the project via bitbucket. Since this is still in production the repository is currently private. Alternatively I can share the cypher dump of the database (80mb compressed). Would that help? – fkorn Apr 11 '16 at 09:43
  • The project would be great- we'll have the entities etc. My bitbucket id is luannemisquitta - please email me at luanne at graphaware dot com with the portion of code to look at, thanks – Luanne Apr 11 '16 at 09:48

0 Answers0