3

I'm looking into some Android performance issues at the moment and noticing some sub-optimal patterns in the dex code. I'm just wondering if anyone knows if this is to be expected, and what the rationale behind it might be.

For example, consider the following Java code:

m_testField += i;

doSomething(m_testField);

When this is built and then run through baksmali it looks like the following:

iget v1, p0, Lcom/example/MainActivity$FieldTest;->m_testField:I

add-int/2addr v1, v0

iput v1, p0, Lcom/example/MainActivity$FieldTest;->m_testField:I

iget v1, p0, Lcom/example/MainActivity$FieldTest;->m_testField:I

invoke-direct {p0, v1}, Lcom/example/MainActivity$FieldTest;->doSomething(I)V 

The part that's concerning me is the iget opcode to read the value of the instance field into register v1. The same field was written from the very same v1 register in the preceding opcode, so the opcode would appear to be completely redundant.

The only thing I can think of is that this is done to make this more thread-safe. But surely that should be the programmer's responsibility (by using sync blocks) instead of the compiler's responsibility. Although I'm not 100% certain, I think the above behaviour is quite different to what most C/C++ compilers would do.

I should say that essentially the same dex is produced when ProGuard is used. I should also probably mention that I'm using the very latest Android tools and a late model JDK.

JesusFreke
  • 19,784
  • 5
  • 65
  • 68
  • If a second thread changes m_testField the second iget command is necessary. – Robert Mar 17 '15 at 15:31
  • I believe the optimizer is allowed to eliminate the redundant load unless the field is marked volatile -- in the absence of synchronization operations, the compiler need not anticipate interference from other threads. – fadden Mar 17 '15 at 15:42
  • @Robert - Another thread could also modify the field between the first iget and the iput. So I don't thread safety is behind this. As I said in my answer to myself, I reckon this is because the standard JVM is stack-based. – Dave Mc in Cork Mar 17 '15 at 16:57

2 Answers2

0

Every access to a field is independent. To get the behavior you describe, you need to add an extra local variable:

int local = m_testField; // iget
local = local + i;
m_testField = local; // iput
doSomething(local);

That said, some combination of the interpreter, just-in-time compiler and ahead-of-time compiler may end up making these optimizations for you at runtime anyway.

Jesse Wilson
  • 39,078
  • 8
  • 121
  • 128
  • Thanks for the response. Yes, I was aware that using a local would avoid the send iget. This makes the Java code a bit ugly. Also, it isn't a viable solution where there is a very large quantity of existing code, some of which is delivered in the form of a library created in a different part of the organisation. It would be possible to develop a tool to post-process the dex to remove the redundant opcodes. And I guess my real question is as to whether there is something in the Java language spec that requires all field accesses to be independent. – Dave Mc in Cork Mar 17 '15 at 11:22
  • In this case, the bytecode is a literal translation of the source code. But this should disappear at runtime anyway so you're not paying for this. – Jesse Wilson Mar 18 '15 at 01:40
0

On a hunch, I've done some further research and I think I'm in a position to answer my own question...

The sub-optimal dex seems to be a by-product of the fact that it is generated from standard Java bytecode which is stack-based rather than register-based. I disassembled the .class file corresponding to the sample code in my question. The relevant section looks like this:

5: aload_0       
6: dup           
7: getfield      #22                 // Field m_testField:I
10: iload_1       
11: iadd          
12: putfield      #22                 // Field m_testField:I
15: aload_0       
16: aload_0       
17: getfield      #22                 // Field m_testField:I
20: invokespecial #33                 // Method doSomething:(I)V

After the iadd opcode on line 11 is executed, the value of m_testField is at the top of the stack and the 'this' reference is second from the top. The problem is that the putfield opcode on line 12 removes these from the stack. This means that the field value has to be re-pushed to the stack on line 17.

I must say I'm pretty surprised by this inefficiency. I'd have thought that the dx tool that converts bytecode to dex would be clever enough to remove this redundancy. I'm just hoping that ART is clever enough to do this at runtime instead.

  • The surprise you feel, perhaps mingled with confusion and frustration, is you hearing "the call". Deep down you know compilers and optimizers will never compare to the beauty and perfection of hand-woven assembly. Come, join us. Bonus: for a good time, `dx --no-optimize` – Caleb Fenton Aug 15 '15 at 00:22