Optimising Java objects for CPU cache line efficiency

Question

I'm writing a library where:

It will need to run on a wide range of different platforms / Java implementations (the common case is likely to be OpenJDK or Oracle Java on Intel 64 bit machines with Windows or Linux)
Achieving high performance is a priority, to the extent that I care about CPU cache line efficiency in object access
In some areas, quite large graphs of small objects will be traversed / processed (let's say around 1GB scale)
The main workload is almost exclusively reads
Reads will be scattered across the object graph, but not totally randomly (i.e. there will be significant hotspots, with occasional reads to less frequently accessed areas)
The object graph will be accessed concurrently (but not modified) by multiple threads. There is no locking, on the assumption that concurrent modification will not occur.

Are there some rules of thumb / guidelines for designing small objects so that they utilise CPU cache lines effectively in this kind of environment?

I'm particularly interested in sizing and structuring the objects correctly, so that e.g. the most commonly accessed fields fit in the first cache line etc.

Note: I am fully aware that this is implementation dependent, that I will need to benchmark, and of the general risks of premature optimization. No need to waste any further bandwidth pointing this out. :-)

I'm curious about the rationale for using Java for this application. You will be fighting the language every step of the way towards having anything approaching the control over data layout that you want. That degree of control would be trivial to achieve e.g. in C++. — Patricia Shanahan, Dec 31 '12 at 03:34
Sorry, I think that's an ambition you'll have to give up. :) In later versions (I don't remember exactly when it started), Hotspot will rearrange the internal structure of your classes for you to fit what it deems to be the best layout. — Dolda2000, Dec 31 '12 at 03:36
@PatriciaShanahan I doubt this would be "trivially" implemented in any language, especially C++ — TheLQ, Dec 31 '12 at 03:38
@Patricia. Good point. However I'm constrained to using the JVM for other reasons (mainly portability, libraries, integration with existing apps). So I'm trying to optimise within that constraint. Also with C++ I'm sure I'd still be fighting the language, just for different reasons (multi-threaded GC on complex object graphs?) :-) — mikera, Dec 31 '12 at 03:51
@TheLQ I'm not saying the whole task would be trivial. It obviously isn't. I do think it would be trivial to force objects to be arranged in memory whatever way the programmer wants. — Patricia Shanahan, Dec 31 '12 at 04:34
Have a look [here](http://www.infoq.com/presentations/click-crash-course-modern-hardware). A shame the site doesn't work on tablets (still using flash). — Stefan Hanke, Dec 31 '12 at 06:08
@PatriciaShanahan -- Less likely to cross cache lines unnecessarily. — Hot Licks, Dec 31 '12 at 14:21

score 12 · Answer 1 · answered Dec 31 '12 at 03:31

A first step towards cache line efficiency is to provide for referential locality (i.e. keeping your data close to each other). This is hard to do in JAVA where almost everything is system allocated and accessed by reference.

To avoid references, the following might be obvious:

have non-reference types (i.e. int, char, etc.) as fields in your objects
keep your objects in arrays
keep your objects small

These rules will at least ensure some referential locality when working on a single object and when traversing the object references in your object graph.

Another approach might be to not use object for your data at all, but have global non-ref typed arrays (of same size) for each item that would normally be a field in your class and then each instance would be identified by a common index into these arrays.

Then for optimizing the size of the arrays or chunks thereof, you have to know the MMU characteristics (page/cache size, number of cache lines, etc). I don't know if JAVA provides this in the System or Runtime classes, but you could pass this information as system properties on start up.

Of course this is totally orthogonal to what you should normally be doing in JAVA :)

Best regards

score 2 · Answer 2 · answered Jan 12 '14 at 17:26

2

You may require information about the various caches of your CPU, you can access it from Java using Cachesize (currently supporting Intel CPUs). This can help to develop cache-aware algorithms.

Disclaimer : author of the lib.

answered Jan 12 '14 at 17:26

Julien

1,302
10
23

Optimising Java objects for CPU cache line efficiency

2 Answers2

Linked