4

there's something weird going on here. Visual Studio is letting me know the ESP value was not properly saved but I cannot see any mistakes in the code (32-bit, windows, __stdcall)

MASM code:

.MODE FLAT, STDCALL
...
    memcpy PROC dest : DWORD, source : DWORD, size : DWORD
    MOV EDI, [ESP+04H]
    MOV ESI, [ESP+08H]
    MOV ECX, [ESP+0CH]
    AGAIN_:
    LODSB
    STOSB
    LOOP AGAIN_
    RETN 0CH
    memcpy ENDP

I am passing 12 bytes (0xC) to the stack then cleaning it up. I have confirmed by looking at the symbols the functions symbol goes like "memcpy@12", so its indeed finding the proper symbol

this is the C prototype:

extern void __stdcall * _memcpy(void*,void*,unsigned __int32);

Compiling in 32-bit. The function copies the memory (I can see in the debugger), but the stack cleanup appears not to be working

EDIT:

MASM code:

__MyMemcpy PROC _dest : DWORD, _source : DWORD, _size : DWORD
MOV EDI, DWORD PTR [ESP + 04H]
MOV ESI, DWORD PTR [ESP + 08H]
MOV ECX, DWORD PTR [ESP + 0CH]
PUSH ESI
PUSH EDI
__AGAIN:
LODSB
STOSB
LOOP __AGAIN
POP EDI
POP ESI
RETN 0CH
__MyMemcpy ENDP

C code:

extern void __stdcall __MyMemcpy(void*, void*, int);

typedef struct {
 void(__stdcall*MemCpy)(void*,void*,int);
}MemFunc;

int initmemfunc(MemFunc*f){
f->MemCpy=__MyMemcpy
}

when I call it like this I get the error:

MemFunc mf={0};
initmemfunc(&mf);
mf.MemCpy(dest,src,size);

when I call it like this I dont:

__MyMemcpy(dest,src,size)
Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
Carol Victor
  • 331
  • 1
  • 7
  • its a typedef prototype. i put it in a struct and fill it with the appropriate memcpy function at runtime. the actual extern definition is the same but with an extern added and no typedef. Like i said though, the function works and copies the memory, but the stack is the issue – Carol Victor Apr 02 '21 at 15:04
  • `memcpy` is a bad name for a custom function. There's already a `memcpy` in the CRT. – rustyx Apr 02 '21 at 15:09
  • i am not importing any libraries into my project – Carol Victor Apr 02 '21 at 15:12
  • I am sorry for not mentioning as i forgot but i have the options for both prologue and epilogue generation already disabled and always have, its the first thing i always write – Carol Victor Apr 02 '21 at 16:08
  • @MichaelPetch your exact code copy pasted works perfectly, but the code the assembler produces is slower and more bloated, it even uses leave at the end for some reason despite not using enter at the beginning. The only question that remains is why my code does not work, as it should – Carol Victor Apr 02 '21 at 16:41
  • 1
    `leave` doesn't need to start with `enter`. Leave by itself is the equivalent of `mov esp, ebp` `pop ebp` and is often the preferred choice over the 2 separate instructions. In the real world situations `leave` is probably similar to the speed of the 2 separate instructions and takes only one byte. I don't know why you use PROC at all except to add the `@12` on the end if you don't use any of the features. – Michael Petch Apr 02 '21 at 16:50
  • I have figured out where the issue lays but I have no clue why it is happening or how to fix it. I make a typedef of the function's prototype and put it inside a struct then fill it in with the address of the given function at runtime but it seems that when i call it from the struct it does not work, but when i call it normally it does – Carol Victor Apr 02 '21 at 17:02
  • @MichaelPetch I have updated my OP. I did not think for a second that could be the issue giving the exception I was getting, but now I see that is the case. – Carol Victor Apr 02 '21 at 17:10
  • Your code works perfectly, but it seems that is not the problem, its something regarding the struct I am placing it in – Carol Victor Apr 02 '21 at 17:14
  • 2
    @CarolVictor: BTW, using `lodsb` / `stosb` in a loop using the slow `loop` instruction will be *much* slower than `rep movsb` (which can use Fast Strings microcode to actually copy 16 or 32 bytes per clock cycle). – Peter Cordes Apr 02 '21 at 18:08

2 Answers2

4

The reason why the stack is corrupted is that MASM "secretly" inserts the prologue code to your function. When I added the option to disable that, the function works for me now.

You can see this, when you switch to assembly mode while still in the C code and then step into your function. It seems that VS doesn't swtich to assembly mode when already in the assembly source.

.586
.MODEL FLAT,STDCALL
OPTION PROLOGUE:NONE 

.CODE

mymemcpy PROC dest:DWORD, src:DWORD, sz:DWORD
    MOV EDI, [ESP+04H]
    MOV ESI, [ESP+08H]
    MOV ECX, [ESP+0CH]
AGAIN_:
    LODSB
    STOSB
    LOOP AGAIN_
    RETN 0CH
mymemcpy ENDP

END
Devolus
  • 21,661
  • 13
  • 66
  • 113
  • i tried to push them at the beginning and pop them at the end but no luck – Carol Victor Apr 02 '21 at 15:09
  • I tried your function and it works now. See my update. I still think that `ESI`/`EDI` must be saved though. – Devolus Apr 02 '21 at 15:50
  • 1
    Sorry I see you added the option for no prologue. I should point out you should use the name of the arguments anyway and allow the assembler to generate them for you. If you use `ret` instead of `ret 0ch` and you change the calling convention the proper prologue code will generate the appropriate `ret` instruction for the calling convention automatically. With STDCALl, MASM will change `ret` to `ret 0ch` automatically. When using PROC my opinion is you may as well use it with the power it affords or don't bother with it at all. – Michael Petch Apr 02 '21 at 15:58
  • You can also get MASM to generate the PUSH and POP code for you for registers using `USES`. I would have done it something like this: https://pastebin.com/raw/aDZUUtYR . Of course I would have used `REP MOVSB` rather than a LODSB/STOSB+loop combination, but I was just producing the code the OP had. – Michael Petch Apr 02 '21 at 16:20
  • 2
    @MichaelPetch: If you're trying to learn how things *really* work under the hood, all this MASM magic seems like the last thing you'd want. Once you understand what MASM is doing for you (e.g. from checking the disassembly), then sure use it when the calling convention is no longer the interesting part, but knowing how to turn all this crap off seems like a good idea for a beginner trying to actually understand asm calling conventions. (Or just not using PROC at all, like you suggest.) – Peter Cordes Apr 02 '21 at 18:10
  • 2
    You have to realize that the PROC directive was born out of an age where segmentation was an issue. The PROC directive became very convenient to generate appropriate code for different code models without rewriting the same function. That is less useful today in the flat model but the PROC directive does do Calling convention name mangling as a bonus on the function which is why I think it often gets used. As I commented above -if you don't use the power of PROC, may as not bother with it at all. – Michael Petch Apr 02 '21 at 18:21
  • @MichaelPetch: Also from an age where lots of things were written in asm, whether the people doing so really liked asm or not. These days, most people messing around with asm are doing it to learn asm / how things work, or for performance reasons, not because they have to. Good point about name mangling to match the `ret imm16`, but other than that MASM's other features to make asm more "high level" are kind of relics of an era of large-scale development in asm. I'd never recommend anyone use its IF or WHILE directives; that could hide optimization possibilities vs. using cmp / jcc yourself. – Peter Cordes Apr 02 '21 at 19:34
4

Since you have provided an update to your question and comments suggesting you disable prologue and epilogue code generation for functions created with the MASM PROC directive I suspect your code looks something like this:

.MODEL FLAT, STDCALL
OPTION PROLOGUE:NONE
OPTION EPILOGUE:NONE

.CODE

__MyMemcpy PROC _dest : DWORD, _source : DWORD, _size : DWORD
    MOV EDI, DWORD PTR [ESP + 04H]
    MOV ESI, DWORD PTR [ESP + 08H]
    MOV ECX, DWORD PTR [ESP + 0CH]
    PUSH ESI
    PUSH EDI
__AGAIN:
    LODSB
    STOSB
    LOOP __AGAIN
    POP EDI
    POP ESI
    RETN 0CH
__MyMemcpy ENDP

END

A note about this code: beware that if your source and destination buffers overlap this can cause problems. If the buffers don't overlap then what you are doing should work. You can avoid this by marking the pointers __restrict. __restrict is an MSVC/C++ extension that will act as a hint to the compiler that the argument doesn't overlap with another. This can allow the compiler to potentially warn of this situation since your assembly code is unsafe for that situation. Your prototypes could have been written as:

extern void __stdcall __MyMemcpy( void* __restrict, void* __restrict, int);

typedef struct {
    void(__stdcall* MemCpy)(void* __restrict, void* __restrict, int);
}MemFunc;

You are using PROC but not taking advantage of any of the underlying power it affords (or obscures). You have disabled PROLOGUE and EPILOGUE generation with the OPTION directive. You properly use RET 0Ch to have the 12 bytes of arguments cleaned from the stack.

From a perspective of the STDCALL calling convention your code is correct as it pertains to stack usage. There is a serious issue in that the Microsoft Windows STDCALL calling convention requires the caller to preserve all the registers it uses except EAX, ECX, and EDX. You clobber EDI and ESI and both need to be saved before you use them. In your code you save them after their contents are destroyed. You have to push both ESI and EDI on the stack first. This will require you adding 8 to the offsets relative to ESP. Your code should have looked like this:

__MyMemcpy PROC _dest : DWORD, _source : DWORD, _size : DWORD
    PUSH EDI                         ; Save registers first
    PUSH ESI
    MOV EDI, DWORD PTR [ESP + 0CH]   ; Arguments are offset by an additional 8 bytes
    MOV ESI, DWORD PTR [ESP + 10H]
    MOV ECX, DWORD PTR [ESP + 14H]
__AGAIN:
    LODSB
    STOSB
    LOOP __AGAIN
    POP ESI                          ; Restore the caller (non-volatile) registers
    POP EDI
    RETN 0CH
__MyMemcpy ENDP

You asked the question why it appears you are getting an error about ESP or a stack issue. I assume you are getting an error similar to this:

enter image description here

This could be a result of either ESP being incorrect when mixing STDCALL and CDECL calling conventions or it can arise out of the value of the saved ESP being clobbered by the function. It appears in your case it is the latter.

I wrote a small C++ project with this code that has similar behaviour to your C program:

#include <iostream>

extern "C" void __stdcall __MyMemcpy( void* __restrict, void* __restrict, int);

typedef struct {
    void(__stdcall* MemCpy)(void* __restrict, void* __restrict, int);
}MemFunc;

int initmemfunc(MemFunc* f) {
    f->MemCpy = __MyMemcpy;
    return 0;
}

char buf1[] = "Testing";
char buf2[200];

int main()
{
    MemFunc mf = { 0 };
    initmemfunc(&mf);
    mf.MemCpy(buf2, buf1, strlen(buf1));
    std::cout << "Hello World!\n" << buf2;
}

When I use code like yours that doesn't properly save ESI and EDI I discovered this in the generated assembly code displayed in the Visual Studio C/C++ debugger:

enter image description here

I have annotated the important parts. The compiler has generated C runtime checks (these can be disabled, but they will just hide the problem and not fix it) including a check of ESP across a STDCALL function call. Unfortunately it relies on saving the original value of ESP (before pushing parameters) into the register ESI. As a result a runtime check is made after the call to __MyMemcpy to see if ESP and ESI are still the same value. If they aren't you get the warning about ESP not being saved correctly.

Since your code incorrectly clobbers ESI (and EDI) the check fails. I have annotated the debug output to hopefully provide a better explanation.


You can avoid the use of a LODSB/STOSB loop to copy data. There is an instruction that just this very operation (REP MOVSB) that copies ECX bytes pointed to by ESI and copies them to EDI. A version of your code could have been written as:

__MyMemcpy PROC _dest : DWORD, _source : DWORD, _size : DWORD
    PUSH EDI                         ; Save registers first
    PUSH ESI
    MOV EDI, DWORD PTR [ESP + 0CH]   ; Arguments are offset by an additional 8 bytes
    MOV ESI, DWORD PTR [ESP + 10H]
    MOV ECX, DWORD PTR [ESP + 14H]
    REP MOVSB
    POP ESI                          ; Restore the caller (non-volatile) registers
    POP EDI
    RETN 0CH
__MyMemcpy ENDP

If you were to use the power of PROC to save the registers ESI and EDI you could list them with the USES directive. You can also reference the argument locations on the stack by name. You can also have MASM generate the proper EPILOGUE sequence for the calling convention by simply using ret. This will clean the up the stack appropriately and in the case of STDCALL return by removing the specified number of bytes from the stack (ie ret 0ch) in this case since there are 3 4-byte arguments.

The downside is that you do have to generate the PROLOGUE and EPILOGUE code that can make things more inefficient:

.MODEL FLAT, STDCALL
.CODE

__MyMemcpy  PROC USES ESI EDI dest : DWORD, source : DWORD, size : DWORD
    MOV EDI, dest
    MOV ESI, source
    MOV ECX, size
    REP MOVSB                        ; Use instead of LODSB/STOSB+Loop  
    RET
__MyMemcpy  ENDP

END

The assembler would generate this code for you:

PUBLIC __MyMemcpy@12
__MyMemcpy@12:
    push        ebp  
    mov         ebp,esp            ; Function prologue generate by PROC
    push        esi                ; USES caused assembler to push EDI/ESI on stack
    push        edi  
    mov         edi,dword ptr [ebp+8]  
    mov         esi,dword ptr [ebp+0Ch]  
    mov         ecx,dword ptr [ebp+10h]  
    rep movs    byte ptr es:[edi],byte ptr [esi]
    ; MASM generated this from the simple RET instruction to restore registers,
    ; clean up stack and return back to caller per the STDCALL calling convention
    pop         edi                ; Assembler
    pop         esi  
    leave  
    ret         0Ch  

Some may rightly argue that having the assembler obscure all this work makes the code potentially harder to understand for someone who doesn't realize the special processing MASM can do with a PROC declared function. This may result in harder to maintain code for someone else that is unfamiliar with MASM's nuances in the future. If you don't understand what MASM may generate, then sticking to coding the body of the function yourself is probably a safer bet. As you have found that also involves turning PROLOGUE and EPILOGUE code generation off.

Michael Petch
  • 46,082
  • 8
  • 107
  • 198
  • 2
    "Please be aware that if your source and destination buffers overlap this can cause problems." This is true. However, the observable behaviour of `rep movsb` is almost exactly the same as the original `lodsb` \ `stosb` \ `loop` sequence. In particular, overlapping buffers succeed or fail in the same way. – ecm Apr 02 '21 at 19:38
  • 1
    @ecm yes. I should have actually moved that comment out of that section. I had meant to actually note this issue for all his code not just my version of it using `rep movsb` . – Michael Petch Apr 02 '21 at 19:40
  • 1
    Thanks for your comment, you have solved my issue! It is rather annoying you have to save all those registers as it essentially slows down your code unnecessarily, even if just by half a nanosecond. – Carol Victor Apr 06 '21 at 13:58
  • @CarolVictor The overhead of the save and restore of the non-volatile registers you clobber is a necessary evil of the calling convention. That overhead is probably dwarfed by the overhead of an actual mem copy. – Michael Petch Apr 06 '21 at 15:11