3

I am using Visual Studio 2008 C++ for Windows Mobile 6 ARMV4I and I'm trying to learn to read the ARM assembly code generated by VS to minimize unneessary buffer copies within an application. So, I've created a test application that looks like this:

#include <vector>

typedef std::vector< BYTE > Buf;

class Foo
{
public:
    Foo( Buf b ) { b_.swap( b ); };
private:
    Buf b_;
};

Buf Create()
{
    Buf b( 1024 );
    b[ 0 ] = 0x0001;
    return b;
}

int _tmain( int argc, _TCHAR* argv[] )
{
    Foo f( Create() );
    return 0;
}

I'd like to understand if the buffer returned by Create is copied when given to the Foo constructor or if the compiler is able to optimize that copy away. In the Release build with optimizations turned on, this generates assembly like this:

class Foo
{
public:
    Foo( Buf b ) { b_.swap( b ); };
0001112C  stmdb       sp!, {r4 - r7, lr} 
00011130  mov         r7, r0 
00011134  mov         r3, #0 
00011138  str         r3, this 
0001113C  str         r3, [r7, #4] 
00011140  str         r3, [r7, #8] 
00011144  ldr         r3, this 
00011148  ldr         r2, this 
0001114C  mov         r5, r7 
00011150  mov         r4, r1 
00011154  str         r3, this, #4 
00011158  str         r2, this, #4 
0001115C  mov         r6, r1 
00011160  ldr         r2, this 
00011164  ldr         r3, this 
00011168  mov         lr, r7 
0001116C  str         r3, this 
00011170  str         r2, this 
00011174  ldr         r2, [lr, #8]! 
00011178  ldr         r3, [r6, #8]! 
0001117C  str         r3, this 
00011180  str         r2, this 
00011184  ldr         r3, this 
00011188  movs        r0, r3 
0001118C  beq         |Foo::Foo + 0x84 ( 111b0h )| 
00011190  ldr         r3, [r1, #8] 
00011194  sub         r1, r3, r0 
00011198  cmp         r1, #0x80 
0001119C  bls         |Foo::Foo + 0x80 ( 111ach )| 
000111A0  bl          000112D4 
000111A4  mov         r0, r7 
000111A8  ldmia       sp!, {r4 - r7, pc} 
000111AC  bl          |stlp_std::__node_alloc::_M_deallocate ( 11d2ch )| 
000111B0  mov         r0, r7 
000111B4  ldmia       sp!, {r4 - r7, pc} 
--- ...\stlport\stl\_vector.h -----------------------------
// snip!
--- ...\asm_test.cpp
    private:
        Buf b_;
    };

Buf Create()
{
00011240  stmdb       sp!, {r4, lr} 
00011244  mov         r4, r0 
    Buf b( 1024 );
00011248  mov         r1, #1, 22 
0001124C  bl          |    
    b[ 0 ] = 0x0001;
00011250  ldr         r3, [r4] 
00011254  mov         r2, #1 
    return b;
}

int _tmain( int argc, _TCHAR* argv[] )
{
00011264  str         lr, [sp, #-4]! 
00011268  sub         sp, sp, #0x18 
    Foo f( Create() );
0001126C  add         r0, sp, #0xC 
00011270  bl          |Create ( 11240h )| 
00011274  mov         r1, r0 
00011278  add         r0, sp, #0 
0001127C  bl          |Foo::Foo ( 1112ch )| 
    return 0;
00011280  ldr         r0, argc 
00011284  cmp         r0, #0 
00011288  beq         |wmain + 0x44 ( 112a8h )| 
0001128C  ldr         r3, [sp, #8] 
00011290  sub         r1, r3, r0 
00011294  cmp         r1, #0x80 
00011298  bls         |wmain + 0x40 ( 112a4h )| 
0001129C  bl          000112D4 
000112A0  b           |wmain + 0x44 ( 112a8h )| 
000112A4  bl          |stlp_std::__node_alloc::_M_deallocate ( 11d2ch )| 
000112A8  mov         r0, #0 
}

What patterns can I look for in the assembly code to understand where the Buf structure is being copied?

Jonathan Leffler
  • 730,956
  • 141
  • 904
  • 1,278
PaulH
  • 7,759
  • 8
  • 66
  • 143
  • 1
    Just by looking at the C++ code, I can see that `Buf` is copied at two places: the `Foo` constructor, and when returned from the `Create` function. – Etienne de Martel Apr 26 '11 at 15:37
  • @Etienne de Martel - The `Buf` returned by the create function should be optimized away by RVO. The compiler may be able to optimize away the copy in the `Foo` constructor, too. I don't know. I'm trying to understand how I can read the assembly to find out what optimizations are applied. – PaulH Apr 26 '11 at 15:43
  • 1
    @Heandel - That may be true if no optimizations are applied or if you know all the optimizations your compiler can apply and in what situations they will be applied. I do not. – PaulH Apr 26 '11 at 15:43
  • @PaulH Are you trying to do micro-optimization? – Etienne de Martel Apr 26 '11 at 15:49
  • @Etienne - a reply to OP's previous question advised use of reference on the Create call to avoid one of those copies, fyi. RVO may avoid the second, depending on compiler. – Steve Townsend Apr 26 '11 at 16:41
  • You may also have to look for calls to `memcpy`. A good compiler should re-use library functions for copying large items. – Thomas Matthews Apr 26 '11 at 16:42
  • @PaulH In theory, optimizations may change with the next revision of the compiler. Best approach would be to design the code to minimize copies as suggested above. – Lou Apr 26 '11 at 22:04

2 Answers2

0

Analyzing Create is fairly straightforward, because the code is so short. NRVO clearly has been applied here because the return statement generated no instructions, the return value is constructed in-place in r0.

The copy that would take place for Foo::Foo's pass-by-value parameter is slightly harder to analyze, but there's very little code between the calls to Create and Foo::Foo where the copy would have to take place, and nothing that would do a deep copy of a std::vector. So it looks like that copy has been eliminated as well. The other possibility is a custom calling convention for Foo::Foo where the argument is actually passed by reference and copied inside the function. You'd need someone capable of deeper ARM assembly analysis that I am to rule that out.

Ben Voigt
  • 277,958
  • 43
  • 419
  • 720
-2

The buffer will be copied; you are using pass by value semantics of c++; no compiler will optimize that for you. How its copied will depend on the copy constructor of std::vector.

tworivers
  • 57
  • 3
  • The C++ standard specifically allows elision of copy constructor calls. – Ben Voigt Apr 29 '11 at 23:18
  • http://stackoverflow.com/questions/2143787/what-is-copy-elision-and-how-does-it-optimize-the-copy-and-swap-idiom – tworivers Apr 30 '11 at 05:04
  • 2
    @user539312: Copy elision is one of the few optimizations allowed to change behavior. "When certain criteria are met, an implementation is allowed to omit the copy/move construction of a class object, **even if the copy/move constructor and/or destructor for the object have side effects**. In such cases, the implementation treats the source and target of the omitted copy/move operation as simply two different ways of referring to the same object, and the destruction of that object occurs at the later of the times when the two objects would have been destroyed without the optimization." – Ben Voigt Apr 30 '11 at 15:05