Why false share not work without volatile padding

Question

public class VolatileTest {
private static class T {
      public  long p1,p2,p3, p4,p5;// if comment this and run again
      public  long x = 0L;
      public  long y = 0L;
}

public static T[] arr = new T[2];

static {
    arr[0] = new T();
    arr[1] = new T();
}

public static void main(String[] args) throws InterruptedException {
    Thread t1 = new Thread(()->{
        for(long a=0;a<999999999L;a++){
            arr[0].x = a;
        }
    });
    Thread t2 = new Thread(()->{
        for(long a=0;a<999999999L;a++){
            arr[1].y = a;
        }
    });

    final long start = System.nanoTime();
    t1.start();
    t2.start();
    t1.join();
    t2.join();
    System.out.println((System.nanoTime()-start)/1000000);
}}

the first time I run above code, it cost 519. If i comment line 3 and run again, it cost 510. Why the second code (wihout padding) run faster.thanks

If you run it a few more times, you’ll probably get different numbers. Many things can affect system performance, especially at millisecond granularity. Maybe your system was cleaning out part of its file cache. Maybe it was downloading an update. Maybe the web browser was updating a page or an ad. Maybe you had new e-mail coming in. — VGR, May 11 '22 at 11:54
Of course，I run many time. But these cost is nearly the same. With padding can avoid false share, it should run faster, however the fact is not — king, May 11 '22 at 14:41

pveentjer · Answer 1 · 2022-05-16T17:58:14.853

This benchmark can't be used to make any conclusions because you don't run it long enough. I would suggest converting it to JMH and try again.

Apart from that, the padding is broken:

You typically want to pad on both sides
With the current approach you have no clue if the padding actually happens in front. You typically fix this by padding to a super and subclass. See jctools for examples:

https://github.com/JCTools/JCTools/blob/master/jctools-core/src/main/java/org/jctools/util/PaddedAtomicLong.java

You don't pad enough. You typically want to pad 128 bytes on both sides due to adject cacheline prefetching. You are 'padding' just 56 bytes.

[edit]

The JIT could optimize the following code:

Thread t1 = new Thread(()->{
        for(long a=0;a<999999999L;a++){
            arr[0].x = a;
        }
    });

To

Thread t1 = new Thread(()->{
    arr[0].x = 999999999L-1;
});

And since the value isn't read, in theory, it could even optimize it to:

Thread t1 = new Thread(()->{    
});

So you need to make sure that the JIT doesn't apply dead code elimination. JMH has facilities for that.

If p1,p2,p3, p4,p5, and y are modified with volatile, then rerun the code. The time run with line3 uncommented is shorter. It is strange — king, May 16 '22 at 09:56
If you are not running for a longer period, then your conclusion is based on broken input. — pveentjer, May 16 '22 at 10:07
Another thing that could be playing here is that the code could be optimized out since nobody is reading the values that are written. Have a look at this article for some inspiration how to make a JMH test for your case. https://www.baeldung.com/java-false-sharing-contended — pveentjer, May 16 '22 at 10:52

Why false share not work without volatile padding

1 Answers1