1

I need some advice regarding the code structure for small methods. Below is a method from the Java API. Collections.class

private static Random r;    
public static void shuffle(List<?> var0) {
    Random var1 = r;
    if (var1 == null) {
       r = var1 = new Random();
    }
    shuffle(var0, var1);
}

The code can be rewritten as

private static Random r;    
public static void shuffle(List<?> var0) {
    if (r == null) {
       r = new Random();
    }
    shuffle(var0, r);
}

I want to know if the 2nd method has any side-effects that I am missing. In what scenarios would one choose a particular way over the other?

  • 2
    I think it is *theoretically* possible that in the second snippet you call `shuffle` with `null` if some other thread modifies `r` after the `if` check and before the `shuffle` call. In the first one `var1` will never ever be `null`. – luk2302 May 03 '19 at 10:16
  • 2
    You first example show code stile that was common around 20 years ago. It is kind of optimization because it read memory only once (everything else is register access) when in second example there is two memory read. It is clearly not for readability purposes. – talex May 03 '19 at 10:25

1 Answers1

2

The original source code looks like

public static void shuffle(List<?> list) {
    Random rnd = r;
    if (rnd == null)
        r = rnd = new Random(); // harmless race.
    shuffle(list, rnd);
}

and hints to the point behind the design. The code uses the shared field r and is not thread safe, but designed in a way that the consequences are “harmless”.

Obviously, you should not design your software in that way. That’s left to the experts for certain software where they think, performance matters. (Actually, that shuffle method might not belong to this category)

So for your alternative

private static Random r;    
public static void shuffle(List<?> var0) {
    if (r == null) {
       r = new Random();
    }
    shuffle(var0, r);
}

the consequences of the missing thread safety would not be harmless. The test r == null and the subsequent invocation shuffle(var0, r) bear distinct reads of r and while there should only be transitions from the initial null to an initialized Random instance, reads and writes can be perceived out of order when no thread safe mechanism is used, so when a concurrent write happens, r == null could evaluate to false while the subsequent shuffle(var0, r) reads null.

This doesn’t make this variant wrong. If you document your method or the containing class as not being thread safe and requiring external synchronization when being used by different threads, there would be nothing wrong.

The shuffle method of the Collections class reads the value of r into a local variable, to ensure that the same value will be used by the rnd == null test and the shuffle(list, rnd) call. Since this may read a reference to a Random instance created by another thread, it relies on the fact that the Random instance is thread safe on its own, as otherwise, a racy read could exhibit an inconsistent object state.

It is still possible that the read misses a value previously written by another thread or that a write by another thread happens between the read of null and the subsequent write of a reference to a newly create Random instance. So it is possible that multiple threads will construct and use different Random instances here, which is considered within the “harmless” consequences.

As said, you should not copy this pattern. Rather, decide for either, thread safety or no thread safety, and document your code as such.

Holger
  • 285,553
  • 42
  • 434
  • 765
  • this has to be the most complicated part of java in general. those distinct reads and the fact that you are allowed to read a `null` on that second read (even without some thread actually doing an explicit `r = null`) is just insane to me. – Eugene May 14 '19 at 13:24
  • @Eugene it becomes easier to understand, when considering that optimized code where this happens doesn’t do two reads in different order, but that at least one of these reads simply didn’t happen. That’s why I try to bring the term *perception* into such topics, as code only behaves as if the reads or writes were done out of order, while the truth is that the optimized code doesn’t even remotely look like the original code at all. That’s why multiple threads may perceive the same events in a different order, in other words, act as if there is no absolute order of memory accesses. – Holger May 14 '19 at 15:37
  • it seems that it's even simpler to prove this... suppose you have this `T t = null; T get() { if (t == null) { t = generate(); } return t; } private T generate() { return null; }` there is nothing wrong re-writing that as `T get() { if (t == null) { t = generate(); return t; } return t; }`, or taking SSA form into consideration: (continue on next comment) – Eugene May 20 '19 at 13:37
  • `T get() { T t1 = t; if (t1 == null) { T t2 = generate(); t = t2; return t2; } T t3 = t; return t3; }` . it is visible that once we enter the `if` branch, we never hit `T t3 = t;` thus move that read above the check: `T get() { T t1 = t; T t3 = t; if (t1 == null) { T t2 = generate(); t = t2; return t2; } return t3; }` . (continue in the next comment) – Eugene May 20 '19 at 13:39
  • we can swap the reads freely now: `T t3 = t; T t1 = t; if (t1 == null) {...` it gets pretty obvious that now you can read a null in `T t3 = t;` and not a null in `T t1 = t`... – Eugene May 20 '19 at 13:43
  • @Eugene but there’s no reason for a sane optimizer, to keep `T t3 = t; T t1 = t;` as distinct true memory read operations. As said, it’s better to consider that the optimized code doesn’t look like the original code, not even remotely, most notably not bearing the written number of reads and writes. – Holger May 20 '19 at 16:15
  • this is what Alexey Shipilev presents in one of his examples btw, I just spent a few days understanding it, so apparently that could happen – Eugene May 20 '19 at 16:24