44

In a recent discussion about how to optimize some code, I was told that breaking code up into lots of small methods can significantly increase performance, because the JIT compiler doesn't like to optimize large methods.

I wasn't sure about this since it seems that the JIT compiler should itself be able to identify self-contained segments of code, irrespective of whether they are in their own method or not.

Can anyone confirm or refute this claim?

sanity
  • 35,347
  • 40
  • 135
  • 226
  • The general JIT compilation process consists of these steps..http://publib.boulder.ibm.com/infocenter/java7sdk/v7r0/index.jsp?topic=%2Fcom.ibm.java.win.70.doc%2Fdiag%2Funderstanding%2Fjit_overview.html , but it does not talks about how jit handles modules big or small – AurA Apr 02 '13 at 04:53

4 Answers4

30

The Hotspot JIT only inlines methods that are less than a certain (configurable) size. So using smaller methods allows more inlining, which is good.

See the various inlining options on this page.


EDIT

To elaborate a little:

  • if a method is small it will get inlined so there is little chance to get penalised for splitting the code in small methods.
  • in some instances, splitting methods may result in more inlining.

Example (full code to have the same line numbers if you try it)

package javaapplication27;

public class TestInline {
    private int count = 0;

    public static void main(String[] args) throws Exception {
        TestInline t = new TestInline();
        int sum = 0;
        for (int i  = 0; i < 1000000; i++) {
            sum += t.m();
        }
        System.out.println(sum);
    }

    public int m() {
        int i = count;
        if (i % 10 == 0) {
            i += 1;
        } else if (i % 10 == 1) {
            i += 2;
        } else if (i % 10 == 2) {
            i += 3;
        }
        i += count;
        i *= count;
        i++;
        return i;
    }
}

When running this code with the following JVM flags: -XX:+UnlockDiagnosticVMOptions -XX:+PrintCompilation -XX:FreqInlineSize=50 -XX:MaxInlineSize=50 -XX:+PrintInlining (yes I have used values that prove my case: m is too big but both the refactored m and m2 are below the threshold - with other values you might get a different output).

You will see that m() and main() get compiled, but m() does not get inlined:

 56    1             javaapplication27.TestInline::m (62 bytes)
 57    1 %           javaapplication27.TestInline::main @ 12 (53 bytes)
          @ 20   javaapplication27.TestInline::m (62 bytes)   too big

You can also inspect the generated assembly to confirm that m is not inlined (I used these JVM flags: -XX:+PrintAssembly -XX:PrintAssemblyOptions=intel) - it will look like this:

0x0000000002780624: int3   ;*invokevirtual m
                           ; - javaapplication27.TestInline::main@20 (line 10)

If you refactor the code like this (I have extracted the if/else in a separate method):

public int m() {
    int i = count;
    i = m2(i);
    i += count;
    i *= count;
    i++;
    return i;
}

public int m2(int i) {
    if (i % 10 == 0) {
        i += 1;
    } else if (i % 10 == 1) {
        i += 2;
    } else if (i % 10 == 2) {
        i += 3;
    }
    return i;
}

You will see the following compilation actions:

 60    1             javaapplication27.TestInline::m (30 bytes)
 60    2             javaapplication27.TestInline::m2 (40 bytes)
            @ 7   javaapplication27.TestInline::m2 (40 bytes)   inline (hot)
 63    1 %           javaapplication27.TestInline::main @ 12 (53 bytes)
            @ 20   javaapplication27.TestInline::m (30 bytes)   inline (hot)
            @ 7   javaapplication27.TestInline::m2 (40 bytes)   inline (hot)

So m2 gets inlined into m, which you would expect so we are back to the original scenario. But when main gets compiled, it actually inlines the whole thing. At the assembly level, it means you won't find any invokevirtual instructions any more. You will find lines like this:

 0x00000000026d0121: add    ecx,edi   ;*iinc
                                      ; - javaapplication27.TestInline::m2@7 (line 33)
                                      ; - javaapplication27.TestInline::m@7 (line 24)
                                      ; - javaapplication27.TestInline::main@20 (line 10)

where basically common instructions are "mutualised".

Conclusion

I am not saying that this example is representative but it seems to prove a few points:

  • using smaller method improves readability in your code
  • smaller methods will generally be inlined, so you will most likely not pay the cost of the extra method call (it will be performance neutral)
  • using smaller methods might improve inlining globally in some circumstances, as shown by the example above

And finally: if a portion of your code is really critical for performance that these considerations matter, you should examine the JIT output to fine tune your code and importantly profile before and after.

assylias
  • 321,522
  • 82
  • 660
  • 783
  • 1
    The relevant option being `-XX:InlineSmallCode=n` – Raedwald Apr 02 '13 at 12:34
  • "So using smaller methods allows more inlining, which is good" but if the methods that it is inlining exist only because a large method has been split into several methods, nothing has been gained. – Raedwald Apr 02 '13 at 12:36
  • 1
    @Raedwald `FreqInlineSize` and `MaxInlineSize` are also relevant depending on what you are trying to achieve. Regarding your second comment: it is not that straigthforward - if `m1()` calls `m2()` and both are within the inlining limits, the whole of `m1 + m2` will probably be inlined in the calling site (if it is called often enough etc.). If on the other hand you merge the two methods into a `m()` that is longer than the limit, nothing gets inlined and you lose that optimisation. I'll try to add an example later. – assylias Apr 02 '13 at 13:28
  • 1
    @assylias, can you take a look at my answer and the JMH benchmark code and comment with your thoughts on whether it is valid evidence that smaller methods can give improved performance? – Bobulous Jun 26 '19 at 12:25
7

If you take the exact same code and just break them up into lots of small methods, that is not going to help JIT at all.

A better way to put it is that modern HotSpot JVMs do not penalize you for writing a lot of small methods. They do get aggressively inlined, so at runtime you do not really pay the cost of function calls. This is true even for invokevirtual calls, such as the one that calls an interface method.

I did a blog post several years ago that describes how you can see JVM is inlining methods. The technique is still applicable to modern JVMs. I also found it useful to look at the discussions related to invokedynamic, where how the modern HotSpot JVMs compiles Java byte code gets discussed extensively.

Kohsuke Kawaguchi
  • 3,567
  • 3
  • 19
  • 21
  • "*If you take the exact same code and just break them up into lots of small methods, that is not going to help JIT at all.*" => I don't think that is accurate (at least on hotspot). – assylias Apr 02 '13 at 13:29
  • The reason I say that is basically the same as what @Raewald wrote in another answer, quote "but if the methods that it is inlining exist only because a large method has been split into several methods, nothing has been gained". Would you elaborate why you think it's not accurate? – Kohsuke Kawaguchi Apr 02 '13 at 14:57
  • See the example I have provided - with one big method: no inlining, with two smaller methods doing the exact same thing, both get inlined. – assylias Apr 02 '13 at 15:02
  • @assylias perhaps you missed "the exact same code" part? The argument is that HotSpot aggressively optimizes method calls away. – Thorbjørn Ravn Andersen Apr 03 '13 at 13:41
  • @ThorbjørnRavnAndersen Not sure I follow you: in the first example I gave in my answer, `main` calls `m` which is too long to be inlined and in the second example, `main` calls `m1` which calls `m2` and both get inlined into `main` - and the code of `m` is exactly the same as `m1` + `m2`. So *taking the exact same code and just breaking it into lots of small methods* does improve performance in that example. – assylias Apr 03 '13 at 16:23
3

I've read numerous articles which have stated that smaller methods (as measured in the number of bytes required to represent the method as Java bytecode) are more likely to be eligible for inlining by the JIT (just-in-time compiler) when it compiles hot methods (those which are being run most frequently) into machine code. And they describe how method inlining produces better performance of the resulting machine code. In short: smaller methods give the JIT more options in terms of how to compile bytecode into machine code when it identifies a hot method, and this allows more sophisticated optimizations.

To test this theory, I created a JMH class with two benchmark methods, each containing identical behaviour but factored differently. The first benchmark is named monolithicMethod (all code in a single method), and the second benchmark is named smallFocusedMethods and has been refactored so that each major behaviour has been moved out into its own method. The smallFocusedMethods benchmark look like this:

@Benchmark
public void smallFocusedMethods(TestState state) {
    int i = state.value;
    if (i < 90) {
        actionOne(i, state);
    } else {
        actionTwo(i, state);
    }
}

private void actionOne(int i, TestState state) {
    state.sb.append(Integer.toString(i)).append(
            ": has triggered the first type of action.");
    int result = i;
    for (int j = 0; j < i; ++j) {
        result += j;
    }
    state.sb.append("Calculation gives result ").append(Integer.toString(
            result));
}

private void actionTwo(int i, TestState state) {
    state.sb.append(i).append(" has triggered the second type of action.");
    int result = i;
    for (int j = 0; j < 3; ++j) {
        for (int k = 0; k < 3; ++k) {
            result *= k * j + i;
        }
    }
    state.sb.append("Calculation gives result ").append(Integer.toString(
            result));
}

and you can imagine how monolithicMethod looks (same code but entirely contained within the one method). The TestState simply does the work of creating a new StringBuilder (so that the creation of this object is not counted in the benchmark time) and of choosing a random number between 0 and 100 for each invocation (and this has been deliberately configured so that both benchmarks use exactly the same sequence of random numbers, to avoid the risk of bias).

After running the benchmark with six "forks", each involving five warmups of one second, followed by six iterations of five seconds, the results look like this:

Benchmark                                         Mode   Cnt        Score        Error   Units

monolithicMethod                                  thrpt   30  7609784.687 ± 118863.736   ops/s
monolithicMethod:·gc.alloc.rate                   thrpt   30     1368.296 ±     15.834  MB/sec
monolithicMethod:·gc.alloc.rate.norm              thrpt   30      270.328 ±      0.016    B/op
monolithicMethod:·gc.churn.G1_Eden_Space          thrpt   30     1357.303 ±     16.951  MB/sec
monolithicMethod:·gc.churn.G1_Eden_Space.norm     thrpt   30      268.156 ±      1.264    B/op
monolithicMethod:·gc.churn.G1_Old_Gen             thrpt   30        0.186 ±      0.001  MB/sec
monolithicMethod:·gc.churn.G1_Old_Gen.norm        thrpt   30        0.037 ±      0.001    B/op
monolithicMethod:·gc.count                        thrpt   30     2123.000               counts
monolithicMethod:·gc.time                         thrpt   30     1060.000                   ms

smallFocusedMethods                               thrpt   30  7855677.144 ±  48987.206   ops/s
smallFocusedMethods:·gc.alloc.rate                thrpt   30     1404.228 ±      8.831  MB/sec
smallFocusedMethods:·gc.alloc.rate.norm           thrpt   30      270.320 ±      0.001    B/op
smallFocusedMethods:·gc.churn.G1_Eden_Space       thrpt   30     1393.473 ±     10.493  MB/sec
smallFocusedMethods:·gc.churn.G1_Eden_Space.norm  thrpt   30      268.250 ±      1.193    B/op
smallFocusedMethods:·gc.churn.G1_Old_Gen          thrpt   30        0.186 ±      0.001  MB/sec
smallFocusedMethods:·gc.churn.G1_Old_Gen.norm     thrpt   30        0.036 ±      0.001    B/op
smallFocusedMethods:·gc.count                     thrpt   30     1986.000               counts
smallFocusedMethods:·gc.time                      thrpt   30     1011.000                   ms

In short, these numbers show that the smallFocusedMethods approach ran 3.2% faster, and the difference was statistically significant (with 99.9% confidence). And note that the memory usage (based on garbage collection profiling) was not significantly different. So you get faster performance without increased overhead.

I've run a variety of similar benchmarks to test whether small, focused methods give better throughput, and I've found that the improvement is between 3% and 7% in all cases I've tried. But it's likely that the actual gain depends strongly upon the version of the JVM being used, the distribution of executions across your if/else blocks (I've gone for 90% on the first and 10% on the second to exaggerate the heat on the first "action", but I've seen throughput improvements even with a more equal spread across a chain of if/else blocks), and the actual complexity of the work being done by each of the possible actions. So be sure to write your own specific benchmarks if you need to determine what works for your specific application.

My advice is this: write small, focused methods because it makes the code tidier, easier to read, and much easier to override specific behaviours when inheritance is involved. The fact that the JIT is likely to reward you with slightly better performance is a bonus, but tidy code should be your main goal in the majority of cases. Oh, and it's also important to give each method a clear, descriptive name which exactly summarises the responsibility of the method (unlike the terrible names I've used in my benchmark).

Bobulous
  • 12,967
  • 4
  • 37
  • 68
  • I don't think smaller methods always result in tidier code, though. Sometimes the functionality of a method is strongly coupled with the context where it's called (which simultaneously means that it would be called nowhere else, no one would want to override it, and it would be hard to summarize the functionality with a sensible method name). In that case, doesn't "inlining" the method help readability more? – Imperishable Night Jun 22 '19 at 18:43
  • I agree totally, and methods shouldn't be split out just to pursue the goal of having as few lines of code within a method as possible. But if you stick to the advice that each method should be [responsible for just one thing](https://en.wikipedia.org/wiki/Single_responsibility_principle) then you'll probably find that your methods remain small anyway. – Bobulous Jun 23 '19 at 15:04
1

I don't really understand how it works, but based on the link AurA provided, I would guess that the JIT compiler will have to compile less bytecode if the same bits are being reused, rather than having to compile different bytecode that is similar across different methods.

Aside from that, the more you are able to break down your code into pieces of sense, the more reuse you are going to get out of your code and that is something that will allow optimization for the VM running it (you are providing more schema to work with).

However I doubt it will have any good impact if you break your code down without any sense that provides no code reuse.

Ben Barkay
  • 5,473
  • 2
  • 20
  • 29