I'm writing a library where:
- It will need to run on a wide range of different platforms / Java implementations (the common case is likely to be OpenJDK or Oracle Java on Intel 64 bit machines with Windows or Linux)
- Achieving high performance is a priority, to the extent that I care about CPU cache line efficiency in object access
- In some areas, quite large graphs of small objects will be traversed / processed (let's say around 1GB scale)
- The main workload is almost exclusively reads
- Reads will be scattered across the object graph, but not totally randomly (i.e. there will be significant hotspots, with occasional reads to less frequently accessed areas)
- The object graph will be accessed concurrently (but not modified) by multiple threads. There is no locking, on the assumption that concurrent modification will not occur.
Are there some rules of thumb / guidelines for designing small objects so that they utilise CPU cache lines effectively in this kind of environment?
I'm particularly interested in sizing and structuring the objects correctly, so that e.g. the most commonly accessed fields fit in the first cache line etc.
Note: I am fully aware that this is implementation dependent, that I will need to benchmark, and of the general risks of premature optimization. No need to waste any further bandwidth pointing this out. :-)