Why is neo4j's insert speed so low in this example?

Question

I wanted to test insert speed with the latest spring-data neo4j 4 . I modified the movies example to make things simple and comparable.

Try running the test class: movies.spring.data.neo4j.repositories.PersonRepositoryTest here.

It takes 5sec to add 400 nodes in this example. https://github.com/fodon/neo4j-spring-data-speed-demo

This is a speed test with the older version of neo4j https://github.com/fodon/gs-accessing-data-neo4j-speed

The hello.Application class is about 40x faster than spring-data-neo4j-4 for the same job.

Why is spring-data-neo4j-4 slower than the older version? How can it be sped up?

Are you comparing to SDN 3 remote or embedded? – Luanne Jul 25 '16 at 00:58 — Luanne, Jul 25 '16 at 00:58

Christophe Willemsen · Answer 1 · 2016-07-25T03:45:25.547

A call to save() is actually a direct persitence request against the database. There is currently no notion of defering save() calls.

By turning on query logging by adding a logback-test.xml file to your test resources :

<?xml version="1.0" encoding="UTF-8"?>

<configuration>

    <appender name="console" class="ch.qos.logback.core.ConsoleAppender">
        <encoder>
            <pattern>%d %5p %40.40c:%4L - %m%n</pattern>
        </encoder>
    </appender>

    <logger name="org.neo4j.ogm" level="info" />


    <root level="info">
        <appender-ref ref="console" />
    </root>

</configuration>

You can see that for each Person.save() it will actually make 3 requests :

1 for saving the Car objects
1 for saving the Person object
1 for creating the relationships

-

2016-07-25 05:27:51,093  INFO drivers.embedded.request.EmbeddedRequest: 155 - Request: UNWIND {rows} as row CREATE (n:`Car`) SET n=row.props RETURN row.nodeRef as nodeRef, ID(n) as nodeId with params {rows=[{nodeRef=-590487524, props={type=f27dc1bac12a480}}, {nodeRef=-1760792732, props={type=41ff5d3a69b4a5b4}}, {nodeRef=-637840556, props={type=3e7e77ca5e406a21}}]}
2016-07-25 05:27:54,117  INFO drivers.embedded.request.EmbeddedRequest: 155 - Request: UNWIND {rows} as row CREATE (n:`Person`) SET n=row.props RETURN row.nodeRef as nodeRef, ID(n) as nodeId with params {rows=[{nodeRef=-1446435394, props={name=bafd7ad2721516f8}}]}
2016-07-25 05:27:54,178  INFO drivers.embedded.request.EmbeddedRequest: 155 - Request: UNWIND {rows} as row MATCH (startNode) WHERE ID(startNode) = row.startNodeId MATCH (endNode) WHERE ID(endNode) = row.endNodeId MERGE (startNode)-[rel:`HAS`]->(endNode) RETURN row.relRef as relRefId, ID(rel) as relId with params {rows=[{startNodeId=3, relRef=-712176789, endNodeId=0}, {startNodeId=3, relRef=-821487247, endNodeId=1}, {startNodeId=3, relRef=-31523689, endNodeId=2}]}

The performance would be better if instead the statement for the Person creation would just use as parameter the 100 persons at once, and same for the Car objects.

As of now there is no native out of the box feature in the OGM (opened issue : https://github.com/neo4j/neo4j-ogm/issues/208

However, you can batch them by saving a collection instead of one by one :

@Test
@DirtiesContext
public void speedTest2() {
    SessionFactory sessionFactory = new SessionFactory("hello.neo.domain");
    Session session = sessionFactory.openSession();
    Random rand = new Random(10);
    System.out.println("Before linking up with Neo4j...");
    long start = System.currentTimeMillis();
    long mark = start;
    for (int j = 0; j < 10; j++) {
        List<Person> batch = new ArrayList<>();
        for (int i = 0; i < 100; i++) {
            Person greg = new Person(rand);
            batch.add(greg);
        }
        session.save(batch);
        long now = System.currentTimeMillis();
        System.out.format("%d : Time:%d\n", j, now - mark);
        mark = now;
    }
}

You can see that the results difference is very impressive:

Not initialzing DB.
Before linking up with Neo4j...
0 : Time:7318
1 : Time:1731
2 : Time:1555
3 : Time:1481
4 : Time:1237
5 : Time:1176
6 : Time:1101
7 : Time:1094
8 : Time:1114
9 : Time:1015
Not initialzing DB.
Before linking up with Neo4j...
0 : Time:494
1 : Time:272
2 : Time:230
3 : Time:442
4 : Time:320
5 : Time:247
6 : Time:284
7 : Time:288
8 : Time:366
9 : Time:222

Now insert time is not linear. 0 : Time:8764 1 : Time:3838 2 : Time:3508 3 : Time:4578 4 : Time:4752 5 : Time:6474 6 : Time:7589 7 : Time:8661 8 : Time:9376 9 : Time:12910 — fodon, Jul 26 '16 at 03:04
FYI, I was comparing performance for 1000 node batches. Just for reference, the insert times for the old version were: 0 : Time:2863 1 : Time:1049 2 : Time:780 3 : Time:814 4 : Time:723 5 : Time:762 6 : Time:721 7 : Time:474 8 : Time:575 9 : Time:557 — fodon, Jul 26 '16 at 03:10
Just for completeness, with 10 batches of 10k nodes, the mean insert time for each batch with the old version was 4.4s. Is it possible to get that speed with spring-data-neo4j-4 ? — fodon, Jul 26 '16 at 03:16
I just pushed new code to the repo that tries to insert 10 batches of 10k blocks each. Do you have performance tests for the spring-data neo4j somewhere? If so, have you tested out performance of indexed reads? — fodon, Jul 26 '16 at 03:26
I did a test for 1000 nodes too, after a bit of warmup the avg time is 800ms. I don't have performance tests for sdn4 but bear in mind that sdn3 was using the java api in embedded mode while sdn4 use cypher as abstraction for all the transport modes possible. On a side note, I don't see the point of using a data mapper for batch inserts — Christophe Willemsen, Jul 26 '16 at 07:56
I modified the speedTest() code in PersonRepositoryTest() and posted on github. I get slow loads with 1k node batches (current code is for 10k node batches). Can you share what would be optimal for a stream of objects to be persisted? — fodon, Jul 26 '16 at 17:02
This is not a case of batch inserts. Its a case of fast inserts of streams. Can you share how to do the fastest inserts with the new library? — fodon, Jul 27 '16 at 18:32
I used the suggested version and I am getting "java.util.ConcurrentModificationException" for the list version. It could be due to the Neo4j version that I am using. It is running on a docker image. Could this be an issue? — Cugomastik, Aug 16 '23 at 12:29

Why is neo4j's insert speed so low in this example?

1 Answers1

Linked