0

I'm trying to understand the Java Memory Model but have been failing to get a point regarding CPU caches.

As far as I know it, in JVM we have the following locations to store local and shared variables:

local variables -- on thread stack

shared variables -- in memory, but every CPU cache has a copy of it

So my question is: why store local variables on stack, and (cache) shared variables in CPU cache? Why not the other way around (Supposing that CPU cache is too expensive to store both), we cache local variables in CPU caches and just fetch shared variables from memory? Is this part of the Java language design or the computer architecture?

Further: as simple as "CPU cache" sounds, what if several CPUs share one cache? And in systems with multi-level caches, which level of cache will the copy of shared variables be stored in? Further, if more than 1 threads are running in the same CPU core, does it mean that they are sharing the same set of cached shared-variables, and hence even if the shared variable is not defined volatile, accesses of the variable is still instantly visible to the other threads running on the same CPU?

  • For non-primitive types, the *"values of variables"* are just *references-to* objects in Java. The *objects* themselves are not on the stack. Fields are always associated *with objects*, and since *objects* are not on the stack.. The Java Memory Model is rather lenient on happens-relationships with field reads/writes, which is where the thread-visibility issue often comes into play. – user2864740 Dec 25 '17 at 03:21
  • The topic is far too broad and unspecific for an SO question, but you also have several incorrect assumptions. "local variables -- on thread stack" : this is conceptual Java architecture, which may have little relation with how the code actually runs on the CPU. But it is likely to be stored in main memory most of the time and subject to caching in CPU caches. And not shared with other threads. "shared variables -- in memory, but every CPU cache has a copy of it" - nope, CPU caches aren't that big - it only loads data when that data is used by that CPU. – Erwin Bolwidt Dec 25 '17 at 03:31

1 Answers1

2

"Local" and "shared" variables are meaningless outside the context of your code. They don't influence where or even if the state is cached. It's not even useful to think or reason in terms of where your state is stored; the entire reason the JMM exists is so that details like these, which vary from architecture to architecture are not exposed to the programmer. By relying on low-level hardware details, you are asking the wrong questions about the JMM. It's not useful to your application, it makes it fragile, easier to break, harder to reason with, and less portable.

That said, in general, you should assume that any program state, if not all states, are eligible to be cached. The fact is that what is cached does not actually matter, just that anything and everything can be, whether it be primitive types or reference types, or even state variables encapsulated by several fields. Whatever instructions a thread runs (and those instructions vary by architecture too - beware!), those instructions default back on the CPU to determine what is relevant to cache and what not to cache; it is impossible for programmers to do this themselves (although it is possible to influence where state variables may be cached , see what false sharing is).

Again, we can also make some more generalizations about x86, that active primitive types are probably put on registers because P/ALUs will be able to work with them the fastest. Anything else goes though. It's possible for primitives to be moved to L1/2 cached if they are core-local, it's certainly possible that they would be overwritten quite quickly. The CPU might put state variables on a shared L3 if it thinks that there will be a context switch in the future, or it could not. A hardware expert will need to respond to that.

Ideally, state variables will be stored in the closest cache (register, L1/2/3, then main memory) to the processor unit. That's up the CPU to decide though. It is impossible to reason about cache semantics at the Java level. Even if hyper threading is enabled (I'm not sure what the AMD equivalent is), threads are not allowed to share resources, and even then, if they were, recall that visibility is not the only problem associated with shared state variables; in the case that the processor performs pipelining, you still need the appropriate instructions to ensure the correct ordering (this is even after you get rid of read/write buffering on the CPU), whether this be hwsync or the appropriate fences or others.

Again, reasoning about the properties of the cache is not useful, both because the JMM handles that for you and because it is indeterminable, where/when/what is cached. Further, even if you did know the where/when/what questions, you STILL cannot reason about data visibility; all caches treat cached data in the same way anyways, and you will need to rely on the processor updating the cache state between the ME(O)SI states, instruction ordering, load/store buffering, write-back/through, etc... And you still haven't dealt with problems that can occur at the OS and JVM level yet. Again, luckily, the JDK allows you to use basic tools such as volatile, final, and atomics that work consistently across all platforms and produce code that is predictable and easy(er) to reason with.

  • You answered all my questions and made the picture very clear for me. I think the most important thing is to understand that _caching_ is more of a _CPU scope_ mechanism, and has nothing to do with JVM design. It's not very clear for me though as in how the JVM thread stacks relate to CPU when a piece of java code is executed. But I think that is another question that is too broad to answer. Thanks very much. – justgivememyicecream Dec 25 '17 at 06:49