0

I have a class Serializer:

class Serializer<T> extends Consumer<T> {
    final Consumer<? super T> actual;
    // constructor omitted for brewity
    @Override public synchronized void accept(T t) {
         actual.accept(t);
    }
}

The purpose is to make sure the actual is run from a single thread at a time. However, calling a callback while holding a lock is generally dangerous so instead of holding a lock, callers queue up the incoming value and one of the threads will go in, drain the queue and call the actual consumer in a loop, sequentially. (The other limitation is that the number of concurrent callers is not known.)

    final ConcurrentLinkedQueue<T> queue;
    final AtomicInteger wip;
    @Override public void accept(T t) {
        queue.offer(t);
        if (wip.getAndIncrement() == 0) {
            do {
                actual.accept(queue.poll());
            } while (wip.decrementAndGet() > 0);
        }
    }

This works, and stepping aside the problem with the unbounded queue, thread hopping and a thread stuck in the loop, but benchmarking gives 10% throughput in single threaded case compared to a direct method call. When I. Implement this queueing/emitting with a synchronized block, benchmark gives 50% of the direct case because the JVM optimizes away the synhronization; which would be great but it doesn't scale as well. Using juc.Lock scales but suffers the a similar single-threaded throughput degradation as the above code. If I'm not mistaken, once the JVM optimizes synchronized away, it still has to use some guard in case the method gets invoked concurrently again and put back the locking.

My question is, how can I achieve similar effect with a lock, queue or other serialization logic, i.e., have a cheap and fast path for the case when there is no concurrent call going on and ave another path for the concurret case, so the code scales and remains fast for single threaded use as well.

akarnokd
  • 69,132
  • 14
  • 157
  • 192
  • If your code must be single threaded I'm not sure how it can scale... – assylias May 18 '14 at 16:34
  • This logic and class is part of a dataflow library where there is an operator that can merge multiple streams of data into a single stream, were each stream may be produced in its own thread. Scaling comes from how the concurrency is serialized for multiple producers: juc locks seem to work best, but hey are too much of an overhead when only one datastream is active while the others are dormant for some reason. I'd like to get rid of the synchronization overhead in this case. – akarnokd May 18 '14 at 16:51

2 Answers2

1

As said by others, synchronized already gives you a fairly efficient tool. So using something different should not be motivated by hoping for better performance.

However, if your intention is not to block callers as you said in your question, it is legitimate. (Though this might imply living with lesser performance).

The first thing I see when looking at your attempt using the queue and atomic integer is that it is possible to bypass the queue in the case that there are no pending items and no other consumers running. In the case of low contention that might reduce the overhead:

final ConcurrentLinkedQueue<T> queue;
final AtomicInteger wip;
@Override public void accept(T t) {
  if(wip.compareAndSet(0, 1)) { // no contention?
    actual.accept(t);
    if(wip.decrementAndGet()==0) return; // still no contention
  }
  else {
    if(!queue.offer(t))
      throw new AssertionError("queue should be unbounded");
    if(wip.getAndIncrement() != 0) return; // other consumer running
  }
  do {
    actual.accept(queue.poll());
  } while (wip.decrementAndGet() > 0);
}
Holger
  • 285,553
  • 42
  • 434
  • 765
  • Thanks, this is definitely an improvement in case of single-threaded calls: i get 40Mops/s instead of 18Mops. Unfortunately, the else clause gives 6-10 Mops/s whereas juc.Lock gives 16Mops/s. I suspect, however, that since juc.Lock parks, that throughput is actually achieved by introducing large latency. I'll try some other techniques on top of this to get all cases to good throughput. – akarnokd May 20 '14 at 07:24
  • @kd304: A `Lock` is basically an atomic integer and works similar to the code above in the case of no contention. I would expect similar performance in the single-threaded case. Only if locking fails the thread will be parked (I assume, that you don’t use a “fair” lock). If you experience different result, post your `Lock` based solution code and we can have a look at it… – Holger May 20 '14 at 07:52
  • Thanks for the idea. I managed to reclaim some throughput by adaptively enabling and disabling the fast path. – akarnokd May 21 '14 at 18:18
0

This is what synchronized does for you already. (It uses biased locking amongst other strategies)

If you are asking is there a more efficient way to synchronized the data in a general way, then I would be very surprised if you can out think decades of expertise in this area. If you do find a faster way you could be famous, it could be the "kd304 locking strategy".

If you are asking if there is ways of reducing the overhead of specific/special case pieces of code, then there definitely is, but this depends on what you are doing which is not clear from the question.

Peter Lawrey
  • 525,659
  • 79
  • 751
  • 1,130
  • I'm asking for the reduction of overhead. The environment is the RxJava library and its SerializedObserver, which I mean to improve if possible. Currently it uses a synchronized block which has low overhead on single-threaded use, but if multiple threads use it concurrently, a juc.Lock provides better throughput. In numbers, a direct call produces 300 Mops/s, the synchronized does 100 Mops/s after some rampup and juc.Lock gives 18 Mops/s. For 2 threads piling up, sychronized gives 15 Mops/s whereas juc.Lock gives 21 Mops/s. This advantage seem to stay with higher number of threads. – akarnokd May 18 '14 at 17:46
  • @kd304 optimising the code further will need an analysis of what the code called is doing. Perhaps you can change it to be concurrent or reduce the amount of time the lock needs to be held, or use a stamped lock. – Peter Lawrey May 18 '14 at 18:06