Good day!
I'm writing JIT asm generation in C++ using Xbyak.
The problem appears in prologue and epilogue. The last thing I'm doing in prologue is writing Xmms values to the stack. After this I don't use stack pointer till epilogue so I don't need to update it.
The old code was like that:
// prologue
push( rbp ); // emulate ENTER
mov( rbp, rsp ); // emulate ENTER
push( regNumSteps );
push( retTemp );
for( int i = 6; i <= 15; i++ ) {
vmovaps( ptr[rsp - 16 - ( i - 6 ) * 16], Xmm( i ) );
}
# epilogue
for( int i = 15; i >= 6; i-- ) {
vmovaps( Xmm( i ), ptr[rsp - 16 - ( i - 6 ) * 16] );
}
pop( retTemp );
pop( regNumSteps );
leave();
And this code worked well. But under valgrind it lead to
invalid write of size 8
when storing Xmm14 and Xmm15 in prologue (why 8? Xmms are 16)invalid read of size 16
when loading Xmm15 and Xmm14 in epilogue
My guess was may be valgrind doesn't like working with stack without moving rsp properly?.
After that I rewrote the cycles in prologue and epilogue in the following way:
// prologue
for( int i = 6; i <= 15; i++ ) {
sub( rsp, 16 ); // keep rsp up-to-date
vmovaps( ptr[rsp], Xmm( i ) );
}
// epilogue
for( int i = 15; i >= 6; i-- ) {
vmovaps( Xmm( i ), ptr[rsp] );
add( rsp, 16 ); // keep rsp up-to-date
}
And it passed the valgrind!
The question is WHY? Was my guess even correct or is it just a conincidence?
I've looked through the options of valgrind but I wasn't able to find anything related to this. Googling also didn't help...