0

Good day!

I'm writing JIT asm generation in C++ using Xbyak.

The problem appears in prologue and epilogue. The last thing I'm doing in prologue is writing Xmms values to the stack. After this I don't use stack pointer till epilogue so I don't need to update it.

The old code was like that:

// prologue
push( rbp ); // emulate ENTER
mov( rbp, rsp ); // emulate ENTER
push( regNumSteps );
push( retTemp );
for( int i = 6; i <= 15; i++ ) {
    vmovaps( ptr[rsp - 16 - ( i - 6 ) * 16], Xmm( i ) );
}

# epilogue
for( int i = 15; i >= 6; i-- ) {
    vmovaps( Xmm( i ), ptr[rsp - 16  - ( i - 6 ) * 16] );
}
pop( retTemp );
pop( regNumSteps );
leave();

And this code worked well. But under valgrind it lead to

  • invalid write of size 8 when storing Xmm14 and Xmm15 in prologue (why 8? Xmms are 16)
  • invalid read of size 16 when loading Xmm15 and Xmm14 in epilogue

My guess was may be valgrind doesn't like working with stack without moving rsp properly?.

After that I rewrote the cycles in prologue and epilogue in the following way:

// prologue
for( int i = 6; i <= 15; i++ ) {
    sub( rsp, 16 ); // keep rsp up-to-date
    vmovaps( ptr[rsp], Xmm( i ) );
}
// epilogue
for( int i = 15; i >= 6; i-- ) {
    vmovaps( Xmm( i ), ptr[rsp] );
    add( rsp, 16 ); // keep rsp up-to-date
}

And it passed the valgrind!

The question is WHY? Was my guess even correct or is it just a conincidence?

I've looked through the options of valgrind but I wasn't able to find anything related to this. Googling also didn't help...

FedyuninV
  • 141
  • 10
  • 5
    You seem to be working under the SysV ABI, which only allows 128 bytes of red zone you can access under `rsp` which is why `xmm14` triggers valgrind. PS: you can of course adjust `rsp` once at the start, you don't need to do it in 16 byte increments. – Jester Apr 19 '23 at 22:10

0 Answers0