1
import java.util.ArrayList;


interface ICounter {
    void inc();
    void dec();
    int get();
}


class Counter implements ICounter {
    protected int counter = 0;

    public void inc() { counter++; }
    public void dec() { counter--; }

    public int get() { return counter; }
}

class SynchronizedCounter extends Counter {
    public synchronized void inc() { counter++; }
    public synchronized void dec() { counter--; }
}

class BetterCounter implements ICounter {
    protected long up = 0;
    protected long down = 0;

    public void inc()  { up++; }
    public void dec()  { down--; }

    public int get() { return (int)(up + down); }
}


class SynchronizedBetterCounter extends BetterCounter {
    private Object lock1 = new Object();
    private Object lock2 = new Object();

    public void inc() { synchronized(lock1) { up++; } }
    public void dec() { synchronized(lock2) { down--; }  }

}


class Inc extends Thread {
    ICounter i;

    public Inc(ICounter i) { this.i = i; }

    public void run() {
        for(int x = 0; x < 2147483647 ; x++) {
            i.inc();
        }
    }
}

class Dec extends Thread {
    ICounter i;

    public Dec(ICounter i) { this.i = i; }

    public void run() {
        for(int x = 0; x < 2147483647 ; x++) {
            i.dec();
        }
    }
}


public class Main {

    public static void main(String[] args) {
        ICounter[] c = {new Counter(), new SynchronizedCounter(), new BetterCounter(), new SynchronizedBetterCounter()};
        int numberOfCounters = 4;

        ArrayList<Inc> inc = new ArrayList<>();
        ArrayList<Dec> dec = new ArrayList<>();

        for(int i = 0; i < numberOfCounters; i++) {
            inc.add(new Inc(c[i]));
            dec.add(new Dec(c[i]));
        }

        long start = 0;
        long stop = 0;

        int returnVal[] = new int[4];
        long execTime[] = new long[4];

        for(int i = 0; i < numberOfCounters; i++) {
            start = System.currentTimeMillis();

            inc.get(i).start();
            dec.get(i).start();

            try {
                inc.get(i).join();
                dec.get(i).join();
            } catch (InterruptedException ex) {
            }

            stop = System.currentTimeMillis();
            returnVal[i] = c[i].get();
            execTime[i] = stop-start;
        }

        for(int i = 0; i < 4; i++) {
            System.out.println("Counter: " + c[i].getClass().getSimpleName());
            System.out.println("Value of counter: " + returnVal[i]);
            System.out.println("Execution time: " + execTime[i] + " ms\n");
        }
    }
}

The goal is to make counter with methods inc to increase value and dec to decrease it. It's written in 4 ways - the basic, with synchronization, with 2 variables (1 to increase, 2 to decrease) and with 2 variables and synchronization. There are 2 classes Inc and Dec for multithreading.

I know there can be problems with this code, but the goal is just to add and subtract big number, just once, as fast as possible.

Output

Counter: Counter
Value of counter: 2061724420
Execution time: 141 ms

Counter: SynchronizedCounter
Value of counter: 0
Execution time: 174210 ms

Counter: BetterCounter
Value of counter: 0
Execution time: 39468 ms

Counter: SynchronizedBetterCounter
Value of counter: 0
Execution time: 52176 ms

Question: Why first execution time is so small, comparing to the others?

John Kugelman
  • 349,597
  • 67
  • 533
  • 578
quikq
  • 347
  • 2
  • 13
  • 1
    synchronization is time-consuming. threads have to wait their turns to enter a critical section – mangusta Oct 11 '19 at 20:27
  • although in this case there are no other threads to wait for, the additional time is required to check the locks for availability – mangusta Oct 11 '19 at 20:34
  • Also note that the value of your non-synchronized counter is _wrong_. Fast but broken disqualifies it in my book. You should also check out atomic operations, which are designed to concurrently manipulate primitive types very efficiently. For example, [AtomicLong](https://docs.oracle.com/javase/8/docs/api/java/util/concurrent/atomic/AtomicLong.html) – Mark A Oct 11 '19 at 20:35
  • There indeed are some issues with the test. You might consider fixing them to focus on the actual case that confuses you (e.g. the `SynchronizedBetterCounter` uses two distinct locks - this does not achieve any real effect in this case). However, the crux is almost certainly in inlining. I mean, you didn't expect the loop to really be executed `MAX_VALUE` times, did you? In any case, the question is strongly related (or, considering that this about a "Counter", *nearly* a duplicate) of the one that I answered at https://stackoverflow.com/a/56009487/3182664 – Marco13 Oct 11 '19 at 20:40

2 Answers2

2

tl;dr: Counter vs SynchronizedCounter is comparing executing ~800M perfectly predictable instructions in each thread, vs ~250000M instructions with cross-core synchronization.

I ran the code with java -server -XX:+UnlockDiagnosticVMOptions -XX:CompileCommand=print,Inc.run -XX:+PrintCompilation Main and looked at the disassembly.

After some iterations, this is the bulk of Inc::run for the Counter case:

  0x00007fcdb12bc6c0: mov    %r9d,%r11d
  0x00007fcdb12bc6c3: add    %r8d,%r11d
  0x00007fcdb12bc6c6: mov    %r11d,%ecx
  0x00007fcdb12bc6c9: add    $0x10,%ecx
  0x00007fcdb12bc6cc: mov    %ecx,0xc(%r10)     ;*putfield counter
                                                ; - Counter::inc@7 (line 14)
                                                ; - Inc::run@12 (line 53)

  0x00007fcdb12bc6d0: add    $0x10,%r9d         ;*iinc
                                                ; - Inc::run@17 (line 52)

  0x00007fcdb12bc6d4: cmp    $0x7ffffff0,%r9d
  0x00007fcdb12bc6db: jl     0x00007fcdb12bc6c0  ;*if_icmpge
                                                ; - Inc::run@5 (line 52)

This is essentially equivalent to:

int value = i.counter;
for (; i<2147483632; i+=16) {
  value += 16;
  i.counter = value;
}

So in other words, it inlines the function, adds 16 on each iteration, and never bothers reading back the value.

For comparison, here is the 68 line SynchronizedCounter.inc method (see if you can spot the 3 lines that make up counter++ itself):

  0x00007fb9f356a380: mov    %eax,-0x14000(%rsp)
  0x00007fb9f356a387: push   %rbp
  0x00007fb9f356a388: sub    $0x40,%rsp
  0x00007fb9f356a38c: lea    0x20(%rsp),%rdi
  0x00007fb9f356a391: mov    %rsi,0x8(%rdi)
  0x00007fb9f356a395: mov    (%rsi),%rax
  0x00007fb9f356a398: mov    %rax,%rbx
  0x00007fb9f356a39b: and    $0x7,%rbx
  0x00007fb9f356a39f: cmp    $0x5,%rbx
  0x00007fb9f356a3a3: jne    0x00007fb9f356a42a
  0x00007fb9f356a3a9: mov    0x8(%rsi),%ebx
  0x00007fb9f356a3ac: shl    $0x3,%rbx
  0x00007fb9f356a3b0: mov    0xa8(%rbx),%rbx
  0x00007fb9f356a3b7: or     %r15,%rbx
  0x00007fb9f356a3ba: xor    %rax,%rbx
  0x00007fb9f356a3bd: and    $0xffffffffffffff87,%rbx
  0x00007fb9f356a3c1: je     0x00007fb9f356a452
  0x00007fb9f356a3c7: test   $0x7,%rbx
  0x00007fb9f356a3ce: jne    0x00007fb9f356a417
  0x00007fb9f356a3d0: test   $0x300,%rbx
  0x00007fb9f356a3d7: jne    0x00007fb9f356a3f6
  0x00007fb9f356a3d9: and    $0x37f,%rax
  0x00007fb9f356a3e0: mov    %rax,%rbx
  0x00007fb9f356a3e3: or     %r15,%rbx
  0x00007fb9f356a3e6: lock cmpxchg %rbx,(%rsi)
  0x00007fb9f356a3eb: jne    0x00007fb9f356a497
  0x00007fb9f356a3f1: jmpq   0x00007fb9f356a452
  0x00007fb9f356a3f6: mov    0x8(%rsi),%ebx
  0x00007fb9f356a3f9: shl    $0x3,%rbx
  0x00007fb9f356a3fd: mov    0xa8(%rbx),%rbx
  0x00007fb9f356a404: or     %r15,%rbx
  0x00007fb9f356a407: lock cmpxchg %rbx,(%rsi)
  0x00007fb9f356a40c: jne    0x00007fb9f356a497
  0x00007fb9f356a412: jmpq   0x00007fb9f356a452
  0x00007fb9f356a417: mov    0x8(%rsi),%ebx
  0x00007fb9f356a41a: shl    $0x3,%rbx
  0x00007fb9f356a41e: mov    0xa8(%rbx),%rbx
  0x00007fb9f356a425: lock cmpxchg %rbx,(%rsi)
  0x00007fb9f356a42a: mov    (%rsi),%rax
  0x00007fb9f356a42d: or     $0x1,%rax
  0x00007fb9f356a431: mov    %rax,(%rdi)
  0x00007fb9f356a434: lock cmpxchg %rdi,(%rsi)
  0x00007fb9f356a439: je     0x00007fb9f356a452
  0x00007fb9f356a43f: sub    %rsp,%rax
  0x00007fb9f356a442: and    $0xfffffffffffff007,%rax
  0x00007fb9f356a449: mov    %rax,(%rdi)
  0x00007fb9f356a44c: jne    0x00007fb9f356a497  ;*aload_0
                                                ; - SynchronizedCounter::inc@0 (line 21)

  0x00007fb9f356a452: mov    0xc(%rsi),%eax     ;*getfield counter
                                                ; - SynchronizedCounter::inc@2 (line 21)

  0x00007fb9f356a455: inc    %eax
  0x00007fb9f356a457: mov    %eax,0xc(%rsi)     ;*putfield counter
                                                ; - SynchronizedCounter::inc@7 (line 21)

  0x00007fb9f356a45a: lea    0x20(%rsp),%rax
  0x00007fb9f356a45f: mov    0x8(%rax),%rdi
  0x00007fb9f356a463: mov    (%rdi),%rsi
  0x00007fb9f356a466: and    $0x7,%rsi
  0x00007fb9f356a46a: cmp    $0x5,%rsi
  0x00007fb9f356a46e: je     0x00007fb9f356a48b
  0x00007fb9f356a474: mov    (%rax),%rsi
  0x00007fb9f356a477: test   %rsi,%rsi
  0x00007fb9f356a47a: je     0x00007fb9f356a48b
  0x00007fb9f356a480: lock cmpxchg %rsi,(%rdi)
  0x00007fb9f356a485: jne    0x00007fb9f356a4a7  ;*return
                                                ; - SynchronizedCounter::inc@10 

In addition to this, there's a 93 line loop in Inc::run that invokes the above on each of its 2 billion iterations. Envelope calculations therefore suggest the tl;dr from the start of this post.

that other guy
  • 116,971
  • 11
  • 170
  • 194
0

This looks for optimization done by Java compiler: for(int x = 0; x < 2147483647 ; x++) {i.inc()} can be simplified to i += 2147483647.

Run your code without JIT (-Djava.compiler=NONE) and you will see the difference.

rdk
  • 46
  • 6
  • 1
    While this is almost certainly true (also, as of https://stackoverflow.com/a/56009487/3182664 , and assuming that this works across being called via an `interface`), it is a veeery handwaving answer. More details (beyond the guess) would be appreciated. – Marco13 Oct 11 '19 at 20:44