Java uses only 1 of 2 CPU with NUMA (Neo4J)

Question

I’m working on a java program to create a really large Neo4J database. I use the batchinserter and Executors.newFixedThreadPool to speed things up. My Win2012R2 server has 2 cpu’s (2x6 Cores + 2x6 Hyper-threads) and 256GB in NUMA architecture. My problem is now, that my importer only uses 1 CPU (Node). Is it possible to use both NUMA-Nodes with only one javaprocess?

Javaoptions: -XX:+UseNUMA -Xmx64g -Xms64g

score 3 · Answer 1 · answered Mar 22 '16 at 02:23

It isn't clear how much memory is assigned to each node - is it 256GB or 128GB? Either way, as I understand it, setting a max-heap size less than the amount of memory assigned to the node will usually mean the application stays affined to a single node. This is true under Windows, Solaris and Linux, as far as I'm aware.

Even if you allocate a JVM max heap size greater then the assigned memory to a node, if your heap doesn't grow beyond that size, the process won't spill because the JVM object allocator will always try to create a new object in the same memory pool as the creating thread - and that includes new thread objects.

The primary design goal of the NUMA architecture is to enable different processes to operate on different CPUs with each CPU having localised memory access, rather than having all CPUs contend for the same global shared memory. Having the same process running across multiple nodes is not necessarily that efficient, unless you can arrange for a particular thread to always use the local memory associated with a specific node (thread affinity). Otherwise, remote memory access will slow you down.

I suspect that to use more than one node in your example you will need to either assign different tasks to different nodes, or parallelise the same task across multiple nodes. In the latter case you'll need to ensure that each node has a copy of the same data in local memory. There are libraries available to manage thread affinity from your Java code.

https://github.com/peter-lawrey/Java-Thread-Affinity

score 0 · Answer 2 · answered Mar 21 '16 at 12:01

0

The BatchInserter is single-threaded. You should use the import tool instead. See http://neo4j.com/docs/stable/import-tool.html

answered Mar 21 '16 at 12:01

Mattias Finné

3,034
1
15
7

I have multiple threads with synchronized BatchInserter inserts. One NUMA-node is at 100%(12 Core's with HT) and the other ~0%. So threading seems to works. The import tool is not what I need. My raw data is ~600GB xml(splitted) – Escalus Mar 21 '16 at 12:13
Oh OK, actually the import tool can be modified to read any type of input... the import tool takes advantage of multiple CPUs and favours sequential I/O and so should execute faster than the BatchInserter. The XML can be parsed by an implementation of https://github.com/neo4j/neo4j/blob/2.3/community/kernel/src/main/java/org/neo4j/unsafe/impl/batchimport/input/Input.java to be imported by https://github.com/neo4j/neo4j/blob/2.3/community/kernel/src/main/java/org/neo4j/unsafe/impl/batchimport/ParallelBatchImporter.java – Mattias Finné Mar 30 '16 at 20:48

Java uses only 1 of 2 CPU with NUMA (Neo4J)

2 Answers2

Linked