weakCompareAndSwap vs compareAndSwap

Question

This question is not about the difference between them - I know what spurious failure is and why it happens on LL/SC. My question is if I'm on intel x86 and using java-9 (build 149), why is there a difference between their assembly code?

public class WeakVsNonWeak {

    static jdk.internal.misc.Unsafe UNSAFE = jdk.internal.misc.Unsafe.getUnsafe();

    public static void main(String[] args) throws NoSuchFieldException, SecurityException {

        Holder h = new Holder();
        h.setValue(33);
        Class<?> holderClass = Holder.class;
        long valueOffset = UNSAFE.objectFieldOffset(holderClass.getDeclaredField("value"));

        int result = 0;
        for (int i = 0; i < 30_000; ++i) {
            result = strong(h, valueOffset);
        }
        System.out.println(result);

    }

    private static int strong(Holder h, long offset) {
        int sum = 0;
        for (int i = 33; i < 11_000; ++i) {
            boolean result = UNSAFE.weakCompareAndSwapInt(h, offset, i, i + 1);
            if (!result) {
                sum++;
            }
        }
        return sum;

    }

    public static class Holder {

        private int value;

        public int getValue() {
            return value;
        }

        public void setValue(int value) {
            this.value = value;
        }
    }
}

Running with:

 java -XX:-TieredCompilation 
      -XX:CICompilerCount=1 
      -XX:+UnlockDiagnosticVMOptions  
      -XX:+PrintIntrinsics 
      -XX:+PrintAssembly 
      --add-opens java.base/jdk.internal.misc=ALL-UNNAMED
      WeakVsNonWeak

Output of compareAndSwapInt(relevant parts):

     0x0000000109f0f4b8: movabs $0x111927c18,%rsi  ;   {metadata({method} {0x0000000111927c18} 'compareAndSwapInt' '(Ljava/lang/Object;JII)Z' in 'jdk/internal/misc/Unsafe')}
  0x0000000109f0f4c2: mov    %r15,%rdi
  0x0000000109f0f4c5: test   $0xf,%esp
  0x0000000109f0f4cb: je     0x0000000109f0f4e3
  0x0000000109f0f4d1: sub    $0x8,%rsp
  0x0000000109f0f4d5: callq  0x00000001098569d2  ;   {runtime_call SharedRuntime::dtrace_method_entry(JavaThread*, Method*)}
  0x0000000109f0f4da: add    $0x8,%rsp
  0x0000000109f0f4de: jmpq   0x0000000109f0f4e8
  0x0000000109f0f4e3: callq  0x00000001098569d2  ;   {runtime_call SharedRuntime::dtrace_method_entry(JavaThread*, Method*)}
  0x0000000109f0f4e8: pop    %r9
  0x0000000109f0f4ea: pop    %r8
  0x0000000109f0f4ec: pop    %rcx
  0x0000000109f0f4ed: pop    %rdx
  0x0000000109f0f4ee: pop    %rsi
  0x0000000109f0f4ef: lea    0x210(%r15),%rdi
  0x0000000109f0f4f6: movl   $0x4,0x288(%r15)
  0x0000000109f0f501: callq  0x00000001098fee40  ;   {runtime_call Unsafe_CompareAndSwapInt(JNIEnv_*, _jobject*, _jobject*, long, int, int)}
  0x0000000109f0f506: vzeroupper 
  0x0000000109f0f509: and    $0xff,%eax
  0x0000000109f0f50f: setne  %al
  0x0000000109f0f512: movl   $0x5,0x288(%r15)
  0x0000000109f0f51d: lock addl $0x0,-0x40(%rsp)
  0x0000000109f0f523: cmpl   $0x0,-0x3f04dd(%rip)        # 0x0000000109b1f050

Output of weakCompareAndSwapInt:

  0x000000010b698840: sub    $0x18,%rsp
  0x0000010b698847: mov    %rbp,0x10(%rsp)
  0x000000010b69884c: mov    %r8d,%eax
  0x000000010b69884f: lock cmpxchg %r9d,(%rdx,%rcx,1)
  0x000000010b698855: sete   %r11b
  0x000000010b698859: movzbl %r11b,%r11d        ;*invokevirtual compareAndSwapInt {reexecute=0 rethrow=0 return_oop=0}
                                                ; - jdk.internal.misc.Unsafe::weakCompareAndSwapInt@7 (line 1369)

I am by far not versatile enough to understand the entire output, but can definitely see the difference between lock addl and lock cmpxchg.

EDIT Peter's answer got me thinking. Let's see if compareAndSwap will be an intrinsic call:

-XX:+PrintIntrinsics -XX:-PrintAssembly

 @ 7   jdk.internal.misc.Unsafe::compareAndSwapInt (0 bytes)   (intrinsic)
 @ 20      jdk.internal.misc.Unsafe::weakCompareAndSwapInt (11 bytes)   (intrinsic).

And then run the example twice with/without:

-XX:DisableIntrinsic=_compareAndSwapInt

This is sort of weird, the output is exactly the same (same exact instructions) with the only differences that with enable intrinsic I get calls like this:

  0x000000010c23e355: callq  0x00000001016569d2  ;   {runtime_call SharedRuntime::dtrace_method_entry(JavaThread*, Method*)}
  0x000000010c23e381: callq  0x00000001016fee40  ;   {runtime_call Unsafe_CompareAndSwapInt(JNIEnv_*, _jobject*, _jobject*, long, int, int)}

And disabled:

  0x00000001109322d5: callq  0x0000000105c569d2  ;   {runtime_call _ZN13SharedRuntime19dtrace_method_entryEP10JavaThreadP6Method}
    0x00000001109322e3: callq  0x0000000105c569d2  ;   {runtime_call _ZN13SharedRuntime19dtrace_method_entryEP10JavaThreadP6Method}

This is rather intriguing, shouldn't the intrinsic code be different?

EDIT-2 the8472 makes sense too.

lock addl is a substitute for mfence that flushes the StoreBuffer on x86 as far as I know and it has to do with visibility and not atomicity indeed. Right before this entry, is:

 0x00000001133db6f6: movl   $0x4,0x288(%r15)
 0x00000001133db701: callq  0x00000001060fee40  ;   {runtime_call Unsafe_CompareAndSwapInt(JNIEnv_*, _jobject*, _jobject*, long, int, int)}
 0x00000001133db706: vzeroupper 
 0x00000001133db709: and    $0xff,%eax
 0x00000001133db70f: setne  %al
 0x00000001133db712: movl   $0x5,0x288(%r15)
 0x00000001133db71d: lock addl $0x0,-0x40(%rsp)
 0x00000001133db723: cmpl   $0x0,-0xd0bc6dd(%rip)        #     0x000000010631f050
                                            ;   {external_word}

If you look here is will delegate to another native call to Atomic:: cmpxchg that seems to be doing the swap atomically.

Why that is not a substitute to a direct lock cmpxchg is a mystery to me.

with your edits and numerous assembly samples from different optimization levels it's not quite clear what you're actually asking. — the8472, Dec 30 '16 at 17:12
So `sun.misc.Unsafe` still hasn’t gone, but moved to a different package, `jdk.internal.misc`, proving that it’s actually not a compatibility issue, that keeps that class alive? — Holger, Jan 03 '17 at 14:08
@Holger It has not moved, there are two versions now. as Shipilev says sun.misc.Unsafe will be deleted - this time for sure. There are multiple enhancements in the *other* places that sun.misc.Unsafe used to be useful that are now obsolete (like AtomicFieldUpdater). They have even added release/acquire semantics directly into the Unsafe! — Eugene, Jan 03 '17 at 14:14
I just thought that [`VarHandle`](http://download.java.net/java/jdk9/docs/api/?java/lang/invoke/VarHandle.html) is supposed to handle all this stuff officially and now I see an `Unsafe` class that apparently is even extended, compared to the Java 8 version. This doesn’t look like getting rid of it… — Holger, Jan 03 '17 at 14:22
@Holger sun.misc.Unsafe is going to be deleted, not the second one. They still need a way to expose Unsafe and make it, well safe. VarHandle is the safe PUBLIC api that jdk.internal.misc.Unsafe exposes. — Eugene, Jan 03 '17 at 14:25
Well, every time there is a non-public API wrapped by a public one, people are going to bypass the official API, thinking there was some benefit in using in unofficial API directly. I don’t see any reason, why there has to be a class named `Unsafe` beneath the official API. It doesn’t contain any implementation anyway, it’s the JVM treating the calls as intrinsics or native methods handling the invocation, so there is no reason not to do that directly for the methods of the official API. In fact, that happens for a lot of API methods, but the *existence* of `Unsafe` creates a wrong impression. — Holger, Jan 03 '17 at 15:03
It speaks a lot that methods like `Unsafe.storeFence()` are not even used in Java 8 internally; this method is *only* used by 3rd party libraries… — Holger, Jan 03 '17 at 15:07
@Holger this slightly goes out of the scope of the question. btw there is a need for relaxed semantics, there has always been; how otherwise will they be exposed for people who *actually* need them? I mean we have lazySet for quite a while now and it has not killed anyone (yet). This would be a good question for someone who actually had taken these decisions. — Eugene, Jan 03 '17 at 15:10
I have no problem with additional semantics, though I have some doubts about methods offering semantics which don’t even exist within the memory model specification, but I really hope, now, that they are becoming officially supported operations, someone is finally going to *document* them. But `VarHandle` is an abstraction that allows alternative JVM implementations, while the `Unsafe` class is tight to several assumptions about it, i.e. that there is always a never-changing field offset. So offering these unofficial operations to 3rd party developers is hindering alternative implementations. — Holger, Jan 03 '17 at 15:20

score 8 · Accepted Answer · edited Nov 09 '17 at 03:59

TL;DR You're looking at the wrong place in the assembly output.

Both compareAndSwapInt and weakCompareAndSwapInt calls are compiled to exactly the same ASM sequence on x86-64. However, the methods themselves are compiled differently (but it does not usually matter).

The definition of compareAndSwapInt and weakCompareAndSwapInt in the source code is different. The former is a native method, while the latter is a Java method.

@HotSpotIntrinsicCandidate
public final native boolean compareAndSwapInt(Object o, long offset,
                                              int expected,
                                              int x);

@HotSpotIntrinsicCandidate
public final boolean weakCompareAndSwapInt(Object o, long offset,
                                                  int expected,
                                                  int x) {
    return compareAndSwapInt(o, offset, expected, x);
}

What you've seen is how these standalone methods are compiled. A native method compiles to a stub that calls a corresponding C function. But this is not what runs in the fast path.
Intrinsic methods are those which calls are replaced with HotSpot-specific inline implementation. Note: The calls are replaced, but not the methods themselves.

If you look at the assembly output of your WeakVsNonWeak.strong method, you'll see that it contains lock cmpxchg instruction, whether it calls UNSAFE.compareAndSwapInt or UNSAFE.weakCompareAndSwapInt.

0x000001bd76170c44: lock cmpxchg %ecx,(%r11)
0x000001bd76170c49: sete   %r10b
0x000001bd76170c4d: movzbl %r10b,%r10d        ;*invokevirtual compareAndSwapInt
                                              ; - WeakVsNonWeak::strong@25 (line 23)
                                              ; - WeakVsNonWeak::main@46 (line 14)

0x0000024f56af1097: lock cmpxchg %r11d,(%r8)
0x0000024f56af109c: sete   %r10b
0x0000024f56af10a0: movzbl %r10b,%r10d        ;*invokevirtual weakCompareAndSwapInt
                                              ; - WeakVsNonWeak::strong@25 (line 23)
                                              ; - WeakVsNonWeak::main@46 (line 14)

Once the main method is JIT-compiled, the standalone version of Unsafe.* methods will not be called directly.

you are right: it's hard to read the output in it's entire glory without some proper experience (like me). You're explanations are fantastic! what I've seen and shown in the code are the *individual* method output from the c2 compilation which != intrinsic code; once `strong` method is compiled, using `UNSAFE.compareAndSwapInt` or `UNSAFE.weakCompareAndSwapInt` yields the same output meaning they're intrinsic code is the same. — Eugene, Jan 17 '17 at 14:52

score 5 · Answer 2 · answered Dec 29 '16 at 14:56

5

In the first case, a native method is being used. Either the code hasn't been optimised or it's not an intrinsic.

In the second case an intrinsic has been used to inline the assembly required, rather than call a JNI method. I would have though both cases would do this but I guess not.

answered Dec 29 '16 at 14:56

Peter Lawrey

525,659
79
751
1,130

2

indeed you are *probably* right, but I am not sure why. See the edit – Eugene Dec 29 '16 at 21:09
1

@Eugene I agree it appears backwards. The intrinsic should have the mov and the non-intrinsic should have the callq – Peter Lawrey Dec 29 '16 at 21:52
2

that's not the point. *compareAndSwap intrinsic* and *compareAndSwap non-intrinsics* differs **only** in the functions from callq. I was expecting a lot more – Eugene Dec 29 '16 at 22:10
1

@Eugene I am pretty sure you can ignore the dtrace method entry calls. These shouldn't do anything (except help with tracing) – Peter Lawrey Dec 29 '16 at 22:12

score 4 · Answer 3 · answered Dec 29 '16 at 15:33

4

I believe the lock addl is not the atomic op itself but a store-load barrier implementation. the atomic happens in the callq.

Since you're already logging with PrintIntrinsics you should check if it actually gets intrinsified.

answered Dec 29 '16 at 15:33

the8472

40,999
5
70
122

indeed you are right also (see EDIT-2), but it does not answer the main question. nevertheless thank you for your input. – Eugene Dec 30 '16 at 11:10

weakCompareAndSwap vs compareAndSwap

3 Answers3