javax.cache store by reference vs. store by value

Question

I am new to java caching, I try to understand the difference between store by value vs. store by reference.

I have below cited paragraph in java.cache documentation " The purpose of copying entries as they are stored in a Cache and again when they are returned from a Cache is to allow applications to continue mutating the state of the keys and values without causing side-effects to entries held by a Cache. "

What is the "side-effects" mentioned above? And how do we choose how to store in practice?

cruftex · Accepted Answer · 2021-06-08T09:48:19.703

The question is great, since the answer isn't an easy one. The real semantics vary slightly across cache implementations.

store by reference:

The cache stores and returns the identical object references.

Object key = ...
Object value = ...
cache.put(key, value);
assert cache.get(key) == value;
assert cache.iterator().next().getKey() == key;

If you mutate the key after storing the value, you have an ambiguous situation. The effect is the same when using a HashMap or ConcurrentHashMap.

Use store by reference, to:

Maximize performance / minimize processing overhead
When the data is fitting into the Java heap
If you want to mutate a value after storing it. This can be useful for performance, but isn't a recommended practice, since you have to take care of concurrency issues and the usage relies on the store by reference semantics.

store by value:

Also it seems obvious, things are not so clear what store by value really means. According to the Spec leads of JCache: Brian Oliver said it's protection against cache data corruption, Greg Luck said it's everything but not store by reference.

For that matter I did analyze different compliant (means passing the TCK) JCache implementations. Key and value objects are copied when passed to the cache, but you cannot rely on the fact that an object in the cache is copied when returned to the application.

So this assumption isn't true for all JCache implementations:

assert cache.get(key) != cache.get(key);

JCache implementations may even vary more, when it gets into detail. An example:

Map map = cache.getAll(...);
assert map.get(key) != map.get(key);

Here is a contradiction in the expected semantics. We would expect that the map contents are stable, OTOH the cache would need to return a copy of the value on every access. The JCache spec doesn't enforce concrete semantics for this. The devil is in the details.

Since the key is copied upon storage by every cache implementation you will get additional safety that the cache internal data structures are sane, but applications still have the chance to break because of shared value references.

My personal conclusion (I am open for discussion):

Since store by reference is an optional JCache feature, requesting it, would mean you limit the number of cache implementations your application works with. Use store by value always, if you don't rely on store by reference semantics.

However, don't make your application depend on the semantics you think you might get with store by value. Never mutate any object after handing its reference to the cache or after retrieving its reference from the cache.

If there is still doubt, ask your cache vendor. IMHO its good practice to document implementation details. A good example (since I spent much thought in it...) is the JCache chapter in the cache2k user guide

Adam · Answer 2 · 2017-03-13T20:13:13.780

It is to prevent concurrent modification of mutable objects. The side effect is to other threads that are using that object for something.

An example would be if you had a bank program with multiple threads with a cache of Integer objects representing bank account numbers shared between them. Suppose thread one retrieves an number from the cache, and then starts to perform an operation on it. While thread 1 is manipulated the object thread 2 retrieves the same object, and starts to manipulate it as well. Since they are simultaneously manipulating the same object in an uncoordinated way the result is unpredictable. The object itself can even become corrupted.

Storing by value eliminate this common problem in concurrent programming if it simply stores a copy of the object when an object is saved to the cache, and handing out a copy of the object when the object is retrieved from the cache.

Hi Adman! Your answer is correct (and got a +1), however, there are some more details to that topic. See my longish answer.... — cruftex, Mar 14 '17 at 05:59

javax.cache store by reference vs. store by value

2 Answers2

Linked