0

I am trying to do my batch insertion to an existing database but I got the following exception:

Exception in thread "GC-Monitor" java.lang.OutOfMemoryError: Java heap space at java.util.Arrays.copyOf(Arrays.java:2245) at java.util.Arrays.copyOf(Arrays.java:2219) at java.util.ArrayList.grow(ArrayList.java:242) at java.util.ArrayList.ensureExplicitCapacity(ArrayList.java:216) at java.util.ArrayList.ensureCapacityInternal(ArrayList.java:208) at java.util.ArrayList.add(ArrayList.java:440) at java.util.Formatter.parse(Formatter.java:2525) at java.util.Formatter.format(Formatter.java:2469) at java.util.Formatter.format(Formatter.java:2423) at java.lang.String.format(String.java:2792) at org.neo4j.kernel.impl.cache.MeasureDoNothing.run(MeasureDoNothing.java:64) Fail: Transaction was marked as successful, but unable to commit transaction so rolled back.

Here is the structure of my insertion code :

public void parseExecutionRecordFile(Node episodeVersionNode, String filePath, Integer insertionBatchSize) throws Exception {
        Gson gson = new Gson();
        BufferedReader reader = new BufferedReader(new FileReader(filePath));
        String aDataRow = "";
        List<ExecutionRecord> executionRecords = new LinkedList<>();

        Integer numberOfProcessedExecutionRecords = 0;
        Integer insertionCounter = 0;
        ExecutionRecord lastProcessedExecutionRecord = null;
        Node lastProcessedExecutionRecordNode = null;

        Long start = System.nanoTime();
        while((aDataRow = reader.readLine()) != null) {
            JsonReader jsonReader = new JsonReader(new StringReader(aDataRow));
            jsonReader.setLenient(true);
            ExecutionRecord executionRecord = gson.fromJson(jsonReader, ExecutionRecord.class);
            executionRecords.add(executionRecord);

            insertionCounter++;

            if(insertionCounter == insertionBatchSize || executionRecord.getType() == ExecutionRecord.Type.END_MESSAGE) {
                lastProcessedExecutionRecordNode = appendEpisodeData(episodeVersionNode, lastProcessedExecutionRecordNode, executionRecords, lastProcessedExecutionRecord == null ? null : lastProcessedExecutionRecord.getTraceSequenceNumber());
                executionRecords = new LinkedList<>();
                lastProcessedExecutionRecord = executionRecord;
                numberOfProcessedExecutionRecords += insertionCounter;
                insertionCounter = 0;
            }
        }
    }

public Node appendEpisodeData(Node episodeVersionNode, Node previousExecutionRecordNode, List<ExecutionRecord> executionRecordList, Integer traceCounter) {
        Iterator<ExecutionRecord> executionRecordIterator = executionRecordList.iterator();

        Node previousTraceNode = null;
        Node currentTraceNode = null;
        Node currentExecutionRecordNode = null;

        try (Transaction tx = dbInstance.beginTx()) {
            // some graph insertion

            tx.success();
            return currentExecutionRecordNode;
        }
    }

So basically, I read json object from a file (ca. 20,000 objects) and insert it to neo4j every 10,000 records. If I have only 10,000 JSON objects in the file, then it works fine. But when I have 20,000, it throws the exception.

Thanks in advance and any help would be really appreciated!

Peter Sie
  • 175
  • 11
  • How much heap do you use? – Michael Hunger Aug 06 '15 at 19:41
  • How many relationships do you insert for those 20k rows? Your batch-size is not in your code-sample. – Michael Hunger Aug 06 '15 at 19:43
  • Hi @MichaelHunger, I use 1024 Mb heap size. Actually the insertion of these 10K records goes along with the insertion of other types of nodes which can be half of the 10K. There are ca. 40K relationships with 10K records. So in total: 15K nodes + 40K relationships. I have managed to tweak my code and make the batch insertion with the 10K batch size works. But it can't be more than 10K. It crashes if I set the batch to 20K with the above exception. – Peter Sie Aug 10 '15 at 15:25
  • Can you upgrade to 2.2.4 too? – Michael Hunger Aug 12 '15 at 15:01

2 Answers2

2

If with 10000 objects works, just try to at least duplicate the heap memory. Take a look at the following site: http://neo4j.com/docs/stable/server-performance.html

The wrapper.java.maxmemory option could resolve your problem.

alacambra
  • 549
  • 5
  • 12
1

As you also insert several k properties all that tx state will be held in memory. So I think 10k batch size is just fine for that amount of heap.

You also don't close your JSON reader so it might linger around with the StringReader inside.

You should also use an ArrayList initialized at your batch-size and use list.clear() instead of recreation/reassignment.

Michael Hunger
  • 41,339
  • 3
  • 57
  • 80
  • Thanks Michael for your response. When I clear the batch list using .clear(), even my 10K batch insertion gives me the outOfMemoryException. :( The closing of StringReader and JsonReader increases the elapsed time and doesn't solve the exception. :( – Peter Sie Aug 11 '15 at 16:56
  • Off-topic question, can the use of Future improve the insertion performance? I've tried it myself and I need to wait for every thread to complete to start with the next thread, otherwise it throws me an outOfMemoryException again. This way doesn't give any performance advantage. – Peter Sie Aug 11 '15 at 17:03
  • It's really weird. Now the first trial insertion (after the db files are initialized) always returns this outOfMemoryException. But, the following trials are then successful. – Peter Sie Aug 11 '15 at 17:55
  • Could you share the full code and your data - source files with me? michael at neo4j.com ? – Michael Hunger Aug 12 '15 at 15:00
  • Thank you for your willingness to review it. I have sent it to your email. looking forward to receiving your feedback! – Peter Sie Aug 13 '15 at 20:11