From Goetz, Peierls, Bloch et al. 2006: Java Concurrency in Practice
3.1.2. Nonatomic 64-bit Operations
When a thread reads a variable without synchronization, it may see a stale value, but at least it sees a value that was actually placed there by some thread rather than some random value. This safety guarantee is called out-of-thin-air safety.
Out-of-thin-air safety applies to all variables, with one exception: 64-bit numeric variables ( double and long ) that are not declared volatile (see Section 3.1.4 ). The Java Memory Model requires fetch and store operations to be atomic, but for nonvolatile long and double variables, the JVM is permitted to treat a 64-bit read or write as two separate 32-bit operations. If the reads and writes occur in different threads, it is therefore possible to read a nonvolatile long and get back the high 32 bits of one value and the low 32 bits of another.[3]
Thus, even if you don't care about stale values, it is not safe to use shared mutable long and double variables in multithreaded programs unless they are declared volatile or guarded by a lock.
[3] When the Java Virtual Machine Specification was written, many widely used processor architectures could not efficiently provide atomic 64-bit arithmetic operations.
This was written after the release of Java 5 which came out in 2004, with many changes targetting easier multi-threading and concurrency programming. So why does it still apply? Even now a decade later?
If it is only because it is possible to run Java apps on 32-bit hardware, why isn't there a JVM run-time option to allow it if desired?
Wouldn't it be beneficial to be able to code multi-threaded and low latency applications without needing to worry about this gotcha?