74

I read the JDK's source code about ConcurrentHashMap.

But the following code confused me:

public boolean isEmpty() {
    final Segment<K,V>[] segments = this.segments;
    ...
}

My question is:

"this.segments" is declared:

final Segment<K,V>[] segments;

So, here, in the beginning of the method, declared a same type reference, point to the same memory.

Why did the author write it like this? Why didn't they use this.segments directly? Is there some reason?

OldCurmudgeon
  • 64,482
  • 16
  • 119
  • 213
HUA Di
  • 901
  • 8
  • 11

3 Answers3

94

This is an idiom typical for lock-free code involving volatile variables. At the first line you read the volatile once and then work with it. In the meantime another thread can update the volatile, but you are only interested in the value you initially read.

Also, even when the member variable in question is not volatile but final, this idiom has to do with CPU caches as reading from a stack location is more cache-friendly than reading from a random heap location. There is also a higher chance that the local var will end up bound to a CPU register.

For this latter case there is actually some controversy, since the JIT compiler will usually take care of those concerns, but Doug Lea is one of the guys who sticks with it on general principle.

Marko Topolnik
  • 195,646
  • 29
  • 319
  • 436
  • So if someone changes `this.segments` content, you won't see that change in your `segments`? – brimborium Oct 31 '12 at 10:33
  • Of course you'll see it. But if someone assigns something else to `segments`, you will obviously be isolated from that. – Marko Topolnik Oct 31 '12 at 10:33
  • 3
    Ah yes, sure. Missunderstood your answer... ;) – brimborium Oct 31 '12 at 10:36
  • as segments (the instance variable) is declared final it cannot change? I remember that adding a superfluos loval variable to the double checked idiom could speed up execution in some vms. Maybe this is a similar case. (@Marko already added that with a more detailed explanation, thanks!) – Pyranja Oct 31 '12 at 10:38
  • Thanks. About the cache-friendly, is it necessary to do such thing on every member variable used in method? – HUA Di Oct 31 '12 at 10:48
  • It is probably unnecessary due to the mentioned optimizations by the JIT. But yes, if you want to increase the chances of optimal code, for example if targetting a wider variety of JVMs, it might help. I would not advise it, though. JDK code idioms are not the best guide for the client code since JDK must cover so many more concerns and scenarios in a single piece of code than any client code. – Marko Topolnik Oct 31 '12 at 10:54
  • 6
    Viewing a variable read in isolation from everything else, the speedup can indeed be significant, but even the worst-case read operation performance is typically drowned out by all other aspects of code. That is why in a real-life application you are very unlikely to ever notice the difference. – Marko Topolnik Oct 31 '12 at 10:56
  • @MarkoTopolnik You said " In the meantime another thread can update the volatile, but you are only interested in the value you initially read." .. Why are we only interested in the value that is initially read ? Isn't there a chance of seeing an out of date value by the reader thread ? – Geek Nov 07 '12 at 07:14
  • @Geek This idiom pertains to a method whose invariant is that the value read at the beginning remains fixed. As far as `java.util.concurrent` collection classes are concerned, such methods are the norm. – Marko Topolnik Nov 07 '12 at 09:22
19

I guess it's for performance consideration, so that we only need retrieve field value once.

You can refer to a singleton idiom from effective java by Joshua Bloch

His singleton is here:

private volatile FieldType field;
FieldType getField() {
  FieldType result = field;
  if (result == null) { 
    synchronized(this) {
      result = field;
      if (result == null) 
        field = result = computeFieldValue();
    }
  }
  return result;
}

and he wrote:

This code may appear a bit convoluted. In particular, the need for the local variable result may be unclear. What this variable does is to ensure that field is read only once in the common case where it’s already initialized. While not strictly necessary, this may improve performance and is more elegant by the standards applied to low-level concurrent programming. On my machine, the method above is about 25 percent faster than the obvious version without a local variable.

larry.li
  • 425
  • 4
  • 9
4

It may reduce byte code size - accessing a local variable is shorter in byte code than accessing an instance variable.

Runtime optimization overhead may be reduced too.

But none of these are significant. It's more about code style. If you feel comfortable with instance variables, by all means. Doug Lea probably feel more comfortable dealing with local variables.

irreputable
  • 44,725
  • 9
  • 65
  • 93