1

Let's say I have a method with the following signature:

public int indexOf(byte[] bytes, byte toFind, int offset, int length) {
  ...
}

This method does something simple like look for the byte toFind in the range [offset, offset+length) in bytes. I want to check up-front whether offset and length are valid for bytes. That is, that offset and offset + length fall in bytes.

The explicit check would look something like:

if (offset < 0 || offset > bytes.length - length) {
  throw ...;  // bad santa!
}

It seems that I can perform this more cheaply (in terms of emitted bytecode, and perhaps runtime performance) by performing "dummy" array access instead:

public int indexOf(byte[] bytes, byte toFind, int offset, int length) {
  int dummy = bytes[offset] + bytes[offset + length - 1];
  ...
}

I'd like to get rid of the int dummy and + if I could, or reduce their cost. The compiler doesn't like standalone accesses like bytes[offset];, presumably because an expression like this usually doesn't have side effects and is pointless (but not so in this case). Using the dummy int also causes a compiler warning which must be suppressed.

Any suggestions on how I can make the change with a minimum amount of bytecode (runtime performance is important here too, but I suspect that most solutions are optimized to the same thing as unused portions are dropped).

BeeOnRope
  • 60,350
  • 16
  • 207
  • 386
  • 1
    Comments of the form "have you profiled this?", "is this really the bottleneck in your application", "will a few bytes/nanoseconds matter here", "smells like premature optimization" will be ignored. – BeeOnRope Jan 03 '13 at 22:59
  • Can you directly manipulate bytecode? Or are you limited to emitting Java code? – templatetypedef Jan 03 '13 at 23:03
  • @templatetypedef - .java code only. – BeeOnRope Jan 03 '13 at 23:07
  • Are you trying to minimize generated bytecode? Or minimize runtime? – templatetypedef Jan 03 '13 at 23:09
  • 1
    @BeeOnRope Note that your 2 examples are not equivalent. In particular, for offset = 3 and length = -1 for example, the first throws an exception but not the second. – assylias Jan 03 '13 at 23:09
  • @templatetypedef - at the core, I'm trying to minimize runtime, but people hate questions like that since it's highly compiler/JVM/call context dependent, so my stated aim is minimizing bytecode. I suspect that the best solutions might result in the same assembly after JIT anyway. Then bytecode is relevant since it affects inlining decisions. – BeeOnRope Jan 03 '13 at 23:13
  • You are right about the case of negative lengths, which I neglected to mention. I'm still deciding if I want to catch negative lengths or just continue (a negative length effectively is treated as a zero length in the remainder of the method). I will remove the `length < 0` from the explicit check to avoid confusing the issue. – BeeOnRope Jan 03 '13 at 23:15
  • In my opinion what you have is already minimum in terms of bytecode size. You can use **javap -c YourClassName** to see the compiled VM instructions and compare different options. – gerrytan Jan 04 '13 at 00:21
  • What do you mean by bytecode length? Do you mean method bytecode length, or class bytecode length (i.e. including constant pool). Ad performance: what JVM do you run it on? – v6ak Jan 06 '13 at 16:49
  • Method bytecode length. Hotspot. – BeeOnRope Jan 07 '13 at 17:22
  • I guess, Hotspot does such a check anyway, even though it's not allowed to throw before entering the loop. It probably uses an uncommon trap asserting the condition holds. So, I guess, you can't get more speed. +++ You might do something like `if (bytes[offset] == bytes[offset + length - 1]) --length;` as in this case, there's no point in checking the last byte. But this is hacky and possibly slow. Or use `int i = 0 & bytes[offset] & bytes[offset + length - 1]` as your loop variable. – maaartinus Sep 13 '17 at 04:32

4 Answers4

2

How about this ?

if ((bytes[offset] | bytes[offset+length-1])==0) { }
Stefanos Kalantzis
  • 1,619
  • 15
  • 23
  • Regarding instruction size, this is even longer than the question’s original solution. When considering the meta information of a local variable, simply disabling debug information when compiling will drop them. When code size is an issue, turning off debug information would be the first step anyway. – Holger Jan 19 '17 at 13:03
  • @Holger - any meta info about locals is stored out of line from the actual bytecode, right? So disabling it won't make the bytecode smaller, which was my thrust here (eg because bytecode size is critical in inlining decisions). – BeeOnRope Jan 19 '17 at 15:21
  • @BeeOnRope: exactly. When considering inlining decisions, this answer’s code is just one byte bigger than your question’s code. Note that v6ak’s answer has a slightly different variant, using `bytes[offset] == bytes[offset+length-1]` instead of `(bytes[offset] | bytes[offset+length-1])==0`. This is one byte smaller, hence exactly as big as your original code. Note that when moving the check into another method and share it with other methods, there is a bigger chance for it to become “hot” and for hot methods there are (much) bigger inlining thresholds, raising the chance of becoming inlined. – Holger Jan 19 '17 at 15:55
  • Yeah, I still don't have a full mental model of how the inlining heuristic. I know about the two thresholds (30-something and 300-something bytes, IIRC) for normal and hot methods, but I don't understand, for example, if hotness is related to a caller-callee pair, or just depends on the callee, etc. – BeeOnRope Jan 19 '17 at 16:23
2

The cheapest way in terms of bytecode length, is the way used by several JRE classes, e.g. ByteBuffer or ArrayList: use a dedicated check method.

Starting with Java 9, there is a standard method for this purpose, which is also starting to supersede these internal check methods, to become the central place for these kind of checks, also known by the JVM’s optimizer if it has a way of optimizing this check:
Objects.checkFromIndexSize​(offset, length, bytes.length);


In comparison to other approaches:

  • Using an array access with a dummy variable:

    public int indexOf1(byte[] bytes, byte toFind, int offset, int length) {
        int dummy = bytes[offset] + bytes[offset + length - 1];
        //...
    }
    

    compiles to

     0: aload_1
     1: iload_3
     2: baload
     3: aload_1
     4: iload_3
     5: iload         4
     7: iadd
     8: iconst_1
     9: isub
    10: baload
    11: iadd
    12: istore        5
    14: ...
    
  • Using an array access and a dummy branch instruction

    public int indexOf2(byte[] bytes, byte toFind, int offset, int length) {
        if ((bytes[offset] | bytes[offset+length-1])==0) { }
        //...
    }
    

    compiles to

     0: aload_1
     1: iload_3
     2: baload
     3: aload_1
     4: iload_3
     5: iload         4
     7: iadd
     8: iconst_1
     9: isub
    10: baload
    11: ior
    12: ifne          15
    15: ...
    
  • Using a dedicated check method

    public int indexOf3(byte[] bytes, byte toFind, int offset, int length) {
        checkIndex(bytes, offset, length);
        //...
    }
    
    private void checkIndex(byte[] bytes, int offset, int length) {
       //...
    }
    

    compiles to

     0: aload_0
     1: aload_1
     2: iload_3
     3: iload         4
     5: invokespecial #23                 // Method checkIndex:([BII)V
     8: ...
    

So delegating wins when considering the code using it. It also does not introduce local variables on the caller side. The additional space required by the implementation method pays off as soon as more than one method uses it. Such a method is usually private or static, being invoked by an instruction without dynamic dispatch and getting inlined at runtime, so there will be no performance difference. The JVM will usually inline methods that small, regardless of whether they are a hot spot or not.

When comparing the performance of an implicit array bounds check with an explicit comparison, there is no reason why either should be faster than the other. They are basically doing the same and in either case, the JVM can elide them if it can prove that the caller will always pass valid numbers only.

By the way, Buffer.checkBounds guides you to an implementation with only a single conditional:

private void checkIndex(byte[] bytes, int offset, int length) {
    if((offset | length | (offset+length) | (bytes.length-(offset+length))) < 0)
        throw new IndexOutOfBoundsException();
}

Unlike the array access variant, this also handles the case where length is negative (but offset+length would yield a valid index).

Holger
  • 285,553
  • 42
  • 434
  • 765
1

I hope that just executing bytes[offset] and bytes[offset + length - 1] is the cheapest way. The shortest way in JVM bytecode would be just to execute these expressions and leave it on the operand stack.

However, you can't do it in Java. You also can't use pop2 instruction (or two pop instructions), because bytes[something] is not a valid Java command. There are three potentially best ways:

  1. Use a method call like int java.lang.Math.max(int, int). This adds one 3-byte invokestatic instruction and one 1-byte pop instruction. So, it is a 4-byte overhead. You can save one byte, if you write a static dummy method with two int arguments and void result. An intelligent JVM optimizer would probably reduce this code to one pop2 instruction, since Math.max(...) has no side effect and you discard the result by pop instruction. However, I am not sure if this applies for Hotspot.
  2. Assign it to a local variable. One assign means one istore instruction. If you have five parameters (including this, because the method is not static), you use the generic 2-byte istore version instead of 1-byte istore_<n> (for n in {0, 1, 2, 3}). If you had at most three parameters, you would probably save something by reducing scope of the dummy variable.
  3. Compare it (=> generate boolean) and use an empty branch, i.e. if ((bytes[offset] == bytes[offset+length-1])) { }. In this case, you don't need any extra method (like max or pop2) or any extra local variable (which enlarges local variable table).

If you don't use any further optimizer and you don't modify the method signature to use less variables, the third way is probably a winner. In my simple test, it requires only 16 bytes for instrutions (some other implementations are equal, but not better) and does not require anything more in local variable table or constant pool. You probably can save several bytes by manual bytecode optimisations or by Proguard. But be careful, Proguard may optimize it too much and remove the array access. (I am not sure, but it claims in documentation that it may remove some NullPointerExceptions.)

See https://gist.github.com/4523924

v6ak
  • 1,636
  • 2
  • 12
  • 27
0

Not sure about the bytecode length, but how about:

bytes[offset] |= bytes[offset];
bytes[offset + length - 1] |= bytes[offset + length - 1];
mindas
  • 26,463
  • 15
  • 97
  • 154
  • 1
    Why not `bytes[offset] |= 0; bytes[offset + length - 1] |= 0;`? That’s still longer than the alternatives, but significantly shorter than repeating the array access expressions. – Holger Jan 19 '17 at 13:42