7

So I have code that currently looks like this

    public boolean in(TransactionType... types)
    {
        if (types == null || types.length == 0)
            return false;

        for (int i = 0; i < types.length; ++i)
            if (types[i] != null && types[i] == this)
                return true;
        return false;
    }

I changed it to this

    public boolean in(TransactionType... types)
    {
        if (types == null || types.length == 0)
            return false;

        for (int i = 0; i < types.length; ++i)
            if (types[i] == this)
                return true;
        return false;
    }

(TransactionType is an enum with roughly 30 values in it)

The results shocked me. In all of my tests, the second was an order of magnitude faster. I expected maybe 2x faster, but not an order of magnitude. Why the difference? Is it the nullcheck that is that much slower or is something strange happening with the extra array access?

My benchmark code looks like this

public class App
{
    public enum TransactionType
    {
        A(1, "A", "A"),
        B(3, "B", "B"),
        C(5, "C", "C"),
        D(6, "D", "D"),
        E(7, "E", "E"),
        F(8, "F", "F"),
        G(9, "G", "G"),
        H(10, "H", "H"),
        I(11, "I", "I"),
        J(12, "J", "J"),
        K(13, "K", "K"),
        L(14, "L", "L"),
        M(15, "M", "M"),
        N(16, "N", "N"),
        O(17, "O", "O"),
        P(18, "P", "P"),
        Q(19, "Q", "Q"),
        R(20, "R", "R"),
        S(21, "S", "S"),
        T(22, "T", "T"),
        U(25, "U", "U"),
        V(26, "V", "V"),
        W(27, "W", "W"),
        X(28, "X", "X"),
        Y(29, "Y", "Y"),
        Z(30, "Z", "Z"),
        AA(31, "AA", "AA"),
        AB(32, "AB", "AB"),
        AC(33, "AC", "AC"),
        AD(35, "AD", "AD"),
        AE(36, "AE", "AE"),
        AF(37, "AF", "AF"),
        AG(38, "AG", "AG"),
        AH(39, "AH", "AH"),
        AI(40, "AI", "AI"),
        AJ(41, "AJ", "AJ"),
        AK(42, "AK", "AK"),
        AL(43, "AL", "AL"),
        AM(44, "AM", "AM"),
        AN(45, "AN", "AN"),
        AO(46, "AO", "AO"),
        AP(47, "AP", "AP");

        public final static TransactionType[] aArray =
        {
            O, Z, N, Y, AB
        };

        public final static TransactionType[] bArray =
        {
            J, P, AA, L, Q, M, K, AE, AK,
            AF, AD, AG, AH
        };

        public final static TransactionType[] cArray =
        {
            S, U, V
        };

        public final static TransactionType[] dArray =
        {
            A, B, D, G, C, E,
            T, R, I, F, H, AC,
            AI, AJ, AL, AM, AN,
            AO
        };

        private int id;
        private String abbrev;
        private String name;

        private TransactionType(int id, String abbrev, String name)
        {
            this.id = id;
            this.abbrev = abbrev;
            this.name = name;
        }

        public boolean in(TransactionType... types)
        {
            if (types == null || types.length == 0)
                return false;

            for (int i = 0; i < types.length; ++i)
                if (types[i] == this)
                    return true;
            return false;
        }

        public boolean inOld(TransactionType... types)
        {
            if (types == null || types.length == 0)
                return false;

            for (int i = 0; i < types.length; ++i)
            {
                if (types[i] != null && types[i] == this)
                    return true;
            }
            return false;
        }
    }

    public static void main(String[] args)
    {
        for (int i = 0; i < 10; ++i)
            bench2();

        for (int i = 0; i < 10; ++i)
            bench1();
    }

    private static void bench1()
    {
        final TransactionType[] values = TransactionType.values();
        long runs = 0;
        long currTime = System.currentTimeMillis();
        while (System.currentTimeMillis() - currTime < 1000)
        {
            for (TransactionType value : values)
            {
                value.inOld(TransactionType.dArray);
            }
            ++runs;
        }
        System.out.println("old " + runs);
    }

    private static void bench2()
    {
        final TransactionType[] values = TransactionType.values();
        long runs = 0;
        long currTime = System.currentTimeMillis();
        while (System.currentTimeMillis() - currTime < 1000)
        {
            for (TransactionType value : values)
            {
                value.in(TransactionType.dArray);
            }
            ++runs;
        }
        System.out.println("new " + runs);
    }
}

Here are the results of the benchmark running

new 20164901
new 20084651
new 45739657
new 45735251
new 45757756
new 45726575
new 45413016
new 45649661
new 45325360
new 45380665
old 2021652
old 2022286
old 2246888
old 2237484
old 2246172
old 2268073
old 2271554
old 2259544
old 2272642
old 2268579

This is using Oracle JDK 1.7.0.67

Cogman
  • 2,070
  • 19
  • 36
  • There's no way to measure something accurately with such a benchmark. Micro-benchmarking is surprisingly difficult. And it's useless in this case, because the difference between the two versions is insignificant (if any). You should prefer the clearer, less redundant code. – JB Nizet Oct 22 '14 at 17:34
  • 2
    I disagree. I'm seeing consistent results with this benchmarking code that reproducibly shows an order of magnitude difference between the two sets of code. The problem most microbenchmarks have is not running for long enough to get the optimizer to kick in. I've found that 10 seconds is generally long enough to get the optimizer to kick in. – Cogman Oct 22 '14 at 17:39
  • 1
    @JBNizet - is that true if one form consistently gets an order of magnitude worse result? – Michael Burr Oct 22 '14 at 17:39
  • Does anything change if the `for` loop consists of: `{ TransactionType tmp = type[i]; if (tmp != null && tmp == this) return true; }` – Michael Burr Oct 22 '14 at 17:41
  • @MichaelBurr Nope. There is no/little difference in performance. So it looks like java did a good job with the array optimization. The null check, for whatever reason, is just slow. – Cogman Oct 22 '14 at 17:46
  • I'd recommend Caliper: https://code.google.com/p/caliper/ for micro-benchmarking. It repeats the test with different numbers of repetitions to try to get a consistent result. Gives you a bit more confidence your test harness isn't messing up your fixture. – Andrew Aylett Oct 22 '14 at 17:59
  • This is interesting. Maybe the JVM is doing type checks or something? The == operator should be very fast, as it is just comparing object instance IDs, I am sure. – Jamie Oct 22 '14 at 18:34
  • @Jamie Right? That is pretty much what I thought was happening. My best guess is that somehow the null check is busting some internal optimization which assumes all elements in the array are non-null. – Cogman Oct 22 '14 at 18:59
  • 2
    It could have more to do with thread management than the null check. Try replacing the null check with "if (types.length < 1000 && types.lengths == 0)" to see if that has the same effect on your code. If so, then the extra instruction probably caused the thread to overrun a threshold and suspend to allow others to run. – Ali Cheaito Oct 22 '14 at 19:00
  • 1
    @AliC No dice, changing the check to the one you proposed gives (roughly) the same performance as the one without the null check. I'm not sure how thread management would come into play as this is a simple benchmarking app with no threading. – Cogman Oct 22 '14 at 19:03
  • 1
    @Cogman That's interesting. I wasn't able to recreate your results locally, which is why this feels like a resource issue rather than optimation. Care to try your benchmarks in a single threaded fashion? This will confirm beyond doubt that the null check is the issue. – Ali Cheaito Oct 22 '14 at 19:31
  • @AliC What do you mean by "In a single threaded fashion"? Currently it is single threaded. Are you talking about doing this without the anonymous inner class? – Cogman Oct 22 '14 at 19:34
  • @Cogman Sorry, my mind automatically associates Runnable with threads. Yes, I mean without the Runnable wrap. Just have bench() run the comparison loop directly. – Ali Cheaito Oct 22 '14 at 19:37
  • @AliC No dice, same results. I'll see if can mock something up that doesn't give away too much company code :) – Cogman Oct 22 '14 at 19:41
  • Incidentally, if performance is important to you here, you should consider using [`EnumSet<>`](http://docs.oracle.com/javase/7/docs/api/java/util/EnumSet.html) instead of a parameters array. It uses bit vectors to represent the enum values present in the set. If you make `dArray` an EnumSet, and call `.contains(value)`, I wouldn't be surprised if it went 100x faster, and there'd be no need for the `in()` method at all. – StriplingWarrior Oct 22 '14 at 20:28
  • @StriplingWarrior Already tested that. It gets roughly the same speed as doing the array search. Most of the arrays being checked just aren't big enough to benefit, `in` is generally used for 5 or less values (which can be iterated over really fast). – Cogman Oct 22 '14 at 20:42
  • @Cogman: That's surprising. I know you can iterate very quickly over small arrays, but that still involves several branches and jumps, whereas EnumSet should just be doing a bitmask operation. Are you reusing the same `EnumSet` or creating a new one each time? – StriplingWarrior Oct 22 '14 at 20:51
  • @StriplingWarrior Reusing. But you should realize that iterating over an array involves very little branching and is a highly local operation (meaning it is likely to take a lot of advantage of the CPUs cache). The type of branching is also highly predictable and optimized for in most modern CPUs. It is really quite hard to beat a linear search for small (less than 50) values. That being said, I did see differences once I started using all of the values in the enum in the `in` statement. That just doesn't happen often in the code. – Cogman Oct 22 '14 at 20:52
  • @StriplingWarrior But don't take my word for it :) You have the benchmark code that I'm using so you can see what I'm seeing pretty easily if you like. – Cogman Oct 22 '14 at 20:57
  • @Cogman: Indeed, that's what I'm seeing. It's surprising because in C# using a bit flag even against only 5 values is still over 20x faster than iterating over an array. In fact, even a HashSet is faster in C#. And the EnumSet documentation specifically says "The space and time performance of this class should be good enough to allow its use as a high-quality, typesafe alternative to traditional int-based 'bit flags.'" I'd be curious to see how it would change if you actually used bit flags. BTW, I tried your original code on http://www.browxy.com/ and can't seem to reproduce your results. – StriplingWarrior Oct 22 '14 at 21:32

1 Answers1

3

The null check doesn't accomplish anything and I'm also surprised it makes such a difference. But I believe your comments essentially answered your own question.

@Cogman wrote:

...iterating over an array involves very little branching and is a highly local operation (meaning it is likely to take a lot of advantage of the CPUs cache). The type of branching is also highly predictable and optimized for in most modern CPUs...

If you compile your class and use javap to print out the disassembled byte code for those two methods you will see:

  public boolean in(App$TransactionType...);
Code:
   0: aload_1       
   1: ifnull        9
   4: aload_1       
   5: arraylength   
   6: ifne          11
   9: iconst_0      
  10: ireturn       
  11: iconst_0      
  12: istore_2      
  13: iload_2       
  14: aload_1       
  15: arraylength   
  16: if_icmpge     34
  19: aload_1       
  20: iload_2       
  21: aaload        
  22: aload_0       
  23: if_acmpne     28
  26: iconst_1      
  27: ireturn       
  28: iinc          2, 1
  31: goto          13
  34: iconst_0      
  35: ireturn       

And also:

 public boolean inOld(App$TransactionType...);
Code:
   0: aload_1       
   1: ifnull        9
   4: aload_1       
   5: arraylength   
   6: ifne          11
   9: iconst_0      
  10: ireturn       
  11: iconst_0      
  12: istore_2      
  13: iload_2       
  14: aload_1       
  15: arraylength   
  16: if_icmpge     40
  19: aload_1       
  20: iload_2       
  21: aaload        
  22: ifnull        34
  25: aload_1       
  26: iload_2       
  27: aaload        
  28: aload_0       
  29: if_acmpne     34
  32: iconst_1      
  33: ireturn       
  34: iinc          2, 1
  37: goto          13
  40: iconst_0      
  41: ireturn       

Your new method removed six of the operations and one of the potential branch sites.

The loop was tight before, now its super tight.

I would have thought that Java would have JIT'd both these methods to essentially the same thing. Your timing numbers suggested otherwise.

Some random numbers:

1.6.33 32b : 646100 vs 727173

1.6.33 64b : 1667665 vs 2668513

1.7.67 32b : 661003 vs 716417

1.7.07 64b : 1663926 vs 32493989

1.7.60 64b : 1700574 vs 32368506

1.8.20 64b : 1648382 vs 32222823

All of the 64-bit JVM's execute both implementations much faster than the 32-bit versions.

Ryan
  • 2,061
  • 17
  • 28
  • Interesting. I didn't look at the java byte code (I'm not really very experienced with that). But I find it really interesting that the 32bit jvms seem to have a much smaller performance difference vs the 64bit jvms. That would explain why some people aren't seeing the same results. – Cogman Oct 23 '14 at 00:34