0

I'm trying to use double-checked locking to maintain an array of binomial coefficients, but I read recently that double-checked locking doesn't work. Efficiency is extremely important so using volatile isn't an option unless it's only inside the conditional statements. I can't see a way to use a static class with a singleton object (this is part of a framework and I don't know what kinds of numbers people will need to use the function for so I can't guess what the maximum chosen value will be or whether the function will be used at all). The only thing I can think of is to make everything not static and insist that each thread that needs to use this method instantiate a Choose object with its own array. It seems like that shouldn't be necessary.

public static final class Util{
/**
 * Static array of nCr values
 */
public static long[][] nCr_arr;

/**
 * Calculate binomial coefficient (n k)
 * 
 * @param n
 *            n
 * @param k
 *            k
 * @return n choose k
 */
public static long nCr(int n, int k) {
    if (k < 0)
        throw new ArithmeticException("Cannot choose a negative number");
    if (n < 0) {
        if (k % 2 == 0)
            return nCr(-n + k - 1, k);
        else
            return -nCr(-n + k - 1, k);
    }
    if (k > n)
        return 0;
    if (k > n / 2)
        k = n - k;
    if (nCr_arr == null) {
        synchronized (Util.class) {
            if (nCr_arr == null)
                nCr_arr = new long[n + 1][];
        }
    }
    if (nCr_arr.length <= n) {
        synchronized (Util.class) {
            if (nCr_arr.length <= n) {
                long[][] newNCR = new long[n + 1][];
                System.arraycopy(nCr_arr, 0, newNCR, 0, nCr_arr.length);
                nCr_arr = newNCR;
            }
        }
    }
    if (nCr_arr[n] == null) {
        synchronized (Util.class) {
            if (nCr_arr[n] == null)
                nCr_arr[n] = new long[k + 1];
        }
    }
    if (nCr_arr[n].length <= k) {
        synchronized (Util.class) {
            if (nCr_arr[n].length <= k) {
                long[] newNCR = new long[k + 1];
                System.arraycopy(nCr_arr[n], 0, newNCR, 0,
                        nCr_arr[n].length);
                nCr_arr[n] = newNCR;
            }
        }
    }
    if (nCr_arr[n][k] == 0) {
        if (k == 0)
            nCr_arr[n][k] = 1;
        else
            nCr_arr[n][k] = nCr(n, k - 1) * (n - (k - 1)) / k;
    }
    return nCr_arr[n][k];
}
}
dspyz
  • 5,280
  • 2
  • 25
  • 63
  • I would recommend making that a real class. What happens if someone calls your method a second time with a larger `n`? The array will have been initialized but will not be large enough. Not only that, if there are threads that use this static method concurrently you will have serous problems. – Kevin Brock Aug 30 '10 at 06:14
  • Also, if you do not expect concurrent use then eliminate that synchronized block since it no longer matters, and then document this thoroughly. You wouldn't need to worry about locking at all if you are just using this in a single thread. But, as my comment above, if you are using this with multiple threads you have a lot of other problems. – Kevin Brock Aug 30 '10 at 06:16
  • Sorry, missed that you are attempting to expand the array in a subsequent test. There still are concurrency problems here with the later assignments. – Kevin Brock Aug 30 '10 at 06:21
  • Yes, this is used concurrently. What's the problem with the later assignments? – dspyz Aug 30 '10 at 15:41

7 Answers7

0

Well, you could always avoid double checked locking by changing the code from:

if (nCr_arr == null) {
    synchronized (Util.class) {
        if (nCr_arr == null)
            nCr_arr = new long[n + 1][];
    }
}

to this:

synchronized (Util.class) {
    if (nCr_arr == null)
        nCr_arr = new long[n + 1][];
}

I bet the performance impact would be very small.

krtek
  • 1,055
  • 1
  • 13
  • 28
  • Most probably the impact will be really high: you are locking in each use of the pointer, while double locking has the cost of locking only in the (rare) cases where the external if fails. – David Rodríguez - dribeas Aug 29 '10 at 20:23
  • 1
    @dribeas: Likely not unless this is highly contended. Recent JVMs do not do a real lock in that case so uncontended synchronize is quite fast. He would need to do some benchmarks in his specific environment to find out. – Kevin Brock Aug 30 '10 at 06:12
  • Well it depends if this method is going to be called very very often. Anyway, double checked locking is still broken - so you could either use synchronisation or initialize the array at first call of the method. – krtek Aug 30 '10 at 06:15
  • I'm running through a couple hundred million game positions per job on 8 simultaneous threads (8-core cpu). For every position, this method may be called anywhere from 1 to 10 times. It is arguably the most used method in the entire program and any unnecessary synchronization will kill all hope of finishing in any reasonable amount of time. Although the array will only be expanded a couple times, it will constantly be accessed by multiple threads simultaneously so putting "synchronized" outside the conditional clauses most definitely will generate heaps of contended accesses. – dspyz Aug 30 '10 at 15:39
0

Are you sure you need to optimize this? Have you profiled running code and found the single lock is too expensive?

0

Or rewrite your code using new Java Concurrency API http://download.oracle.com/javase/1.5.0/docs/api/java/util/concurrent/locks/ReadWriteLock.html and obtain write lock only if really needed.

krtek
  • 1,055
  • 1
  • 13
  • 28
  • Unfortunately, the API docs say, "if the read operations are too short the overhead of the read-write lock implementation [...] can dominate the execution cost, particularly as many read-write lock implementations still serialize all threads through a small section of code" Between this response and the ConcurrentHashMap solution, I can see clearly that Java tries it's best to provide support for this sort of thing, and in this instance it's not good enough. So the best solution is to simply give a separate array to each thread. – dspyz Aug 30 '10 at 18:54
0

Given you are using this in a very performance critical part of your code, I recommend ditching the lazy initialization idea, because it requires several additional comparisons to be performed for each access to a coefficient.

Instead, I'd require the user of your library to manually specify how many coefficients she needs at initialization time. Alternatively, I'd precompute more than the user is every likely to need - you can fit all nCk for n < 1000 into 1 MB of memory.

PS: Might I suggest you use the recursive formula to compute a coefficient?

c[n][k] = c[n-1][k-1] + c[n-1][k]

It won't matter much, but why use to complicated formula when you need Pascals Triangle?

meriton
  • 68,356
  • 14
  • 108
  • 175
  • Either formula will do and you're quite right that I could simply pre-compute all the reasonable values I might need in this case. However, in the future I may need to create a 3-dimensional array for all multinomial coefficients [x^k](1-x^h)^n/(1-x), and then the amount I can easily precompute will significantly drop. Given a choice between having to set up an arbitrary maximum and creating Choose objects which threads are responsible for keeping track of, I'd rather go with the Choose objects. – dspyz Aug 30 '10 at 18:47
0

I looks like you a building a cache of the results as they are calculated, so you could use a concurrent map to hold the results by building a key that combines the 2 int values into a single long.

import java.util.concurrent.ConcurrentHashMap;
import java.util.concurrent.ConcurrentMap;

public final class Util {
    /**
     * Static array of nCr values
     */
    private static final ConcurrentMap<Long,Long> CACHE = 
        new ConcurrentHashMap<Long, Long>();

    /**
     * Calculate binomial coefficient (n k)
     * 
     * @param n
     *            n
     * @param k
     *            k
     * @return n choose k
     */
    public static long nCr(int n, int k) {
        if (k < 0)
            throw new ArithmeticException("Cannot choose a negative number");
        if (n < 0) {
            if (k % 2 == 0)
                return nCr(-n + k - 1, k);
            else
                return -nCr(-n + k - 1, k);
        }

        if (k > n)
            return 0;
        if (k > n / 2)
            k = n - k;

        final long key = (n << 32L) + k;

        Long value = CACHE.get(key);
        if (value != null) {
            return value.longValue();
        } 

        long result;

        if (k == 0)
            result = 1;
        else
            result = nCr(n, k - 1) * (n - (k - 1)) / k;

        CACHE.put(key, result);

        return result;
    }
}
Michael Barker
  • 14,153
  • 4
  • 48
  • 55
  • Looking at the ConcurrentHashMap source, I see that the get() method begins with: if (count != 0) {//... But count is declared as volatile, so this requires a direct memory access every time. In general, java.util is more about convenience than efficiency, so using things out of the collections framework may have hidden drawbacks – dspyz Aug 30 '10 at 18:42
  • That's not always the case and depends on the CPU architecture. On Intel volatile reads are quite cheap (L1 or L2 cache access). Its far cheaper than a contended synchronized block of which there are 4 potential cases in your code. Avoiding those classes is probably a premature optimisation - especially if you are struggling to produce a correct solution. Your code lacks an interaction on the read path that will trigger the happens before/happens after semantics of the memory model, meaning some changes may not be visible (this is the flaw in the double checked pattern without volatile). – Michael Barker Aug 30 '10 at 19:14
0

I ended up just making it not static. If a thread needs to get nCr values, it creates a new Coefficient object and holds onto it.

dspyz
  • 5,280
  • 2
  • 25
  • 63
0

The original code has way too many race conditions. For starters you cant update non-volatile nCr_arr and hope for double-check idiom to work. Declaring it volatile totally defeats the purpose of the cache somehow. Proper code should not use sync at all but CAS.

CHM is a very bad choice here as well (also CHM doesn't scale well). (Also using Long as key is not very good b/c of how valueOf works, it cannot be properly inlined by hotspot since it doesn't always create an object and final value field doesn't help either)

If anyone is (still) interested how to do the code, drop a note. Cheers

bestsss
  • 11,796
  • 3
  • 53
  • 63