2

I have a 64bit program that works with VirtualBox COM interface and implements a frontend for the virtual machine. Recently I started getting weird An invalid or unaligned stack was encountered during an unwind operation exceptions and I wanted to at least understand the causes of this. As I understand the stack needs to be 16byte aligned so, I presume unaligned stack pointer likely to cause this. But the thing is, since all my program does is implement a couple of COM interfaces using the STDMETHOD macros from ATL that should use the correct calling convention then how could I mess up the stack?

Here is an example of the call stack when the issue occurs:

ntdll.dll!00007ffe679ac0b4() Unknown
ntdll.dll!00007ffe67913356() Unknown
msvcrt.dll!__longjmp_internal() Unknown
> VBoxREM.dll!000000006fb0f3c4() Unknown

I tried to goole the __longjmp_internal symbol but did not find anything useful - does it indicate that exception unwind is in progress?

Any pointer on how to approach debugging of this issue or comments what could mess up the stack alignment are welcome, since I understand that in this case it will be impossible to give an exact solution because VirtualBox is involved.

Rudolfs Bundulis
  • 11,636
  • 6
  • 33
  • 71
  • I can't help much, but this is [longjmp](http://www.cplusplus.com/reference/csetjmp/longjmp/) – ChrisWard1000 Oct 28 '14 at 09:46
  • Invalid is much more likely than unaligned. Have you verified that alignment is the problem or are you just guessing? There are more ways to corrupt the stack than there are programmers in the world. Dangling or uninitialised pointers and buffer overruns are probably the most common. Which one you're guilty of is impossible to guess. – molbdnilo Oct 28 '14 at 10:00
  • @molbdnilo I'm guessing, since I was expecting to get access violation or something of that sort if I had a dangling pointer or did a memory overrun, but of course I cannot rule those out. Guess I'll have to do some memory profiling. – Rudolfs Bundulis Oct 28 '14 at 10:06

3 Answers3

4

I've faced this baffling problem recently.

I know it only started happening after I switched from the static C/C++ runtime to the DLL version, so that probably means the static version didn't do stack unwinding.

I then traced the assembly code for longjmp() and noticed one of the 1st conditional branches was on _JUMP_BUFFER.Frame.

if it's 0, then restore a bunch of registers and return.

Aha! so that must mean if _JUMP_BUFFER.Frame = 0, unwinding is disabled. I tried it and indeed, problem solved.

I then tried to observe what Frame should be when a setjmp()/longjmp() pair succeeds. I found usually, frame = stack pointer, but when the unwinding fails, frame != SP. So I tried setting Frame to SP and that also eliminates the exception.

I don't know why that works. I know in the SYSV x86-64 ABI, the frame pointer is optional. Maybe setjmp() needs a proper frame pointer and isn't getting one?

Yale Zhang
  • 1,447
  • 12
  • 30
  • Whoah, if that really does the trick it would be enormously great:) Thanks, will check this. – Rudolfs Bundulis Aug 23 '16 at 11:45
  • What exactly do you mean with "unwinding fails" - an exception during unwinding? If such thing happens I get an abnormal termination and do not even reach the return of the jump. – Rudolfs Bundulis Aug 23 '16 at 13:04
  • I was refering to the "An invalid or unaligned stack was encountered" exception in RtlUnwindEx() when I said "unwinding fails" – Yale Zhang Aug 23 '16 at 19:18
  • Ahh, thanks:) I tried to make a minimal example but I can't see how using a static/dynamic runtime changes the unwinding - in both cases all the objects in stack created prior to calling `longjmp` are destroyed but maybe I'm missing something. – Rudolfs Bundulis Aug 23 '16 at 19:38
  • I also am having difficulty making a minimal example for submitting to Microsoft. My function where longjmp() fails is absolutely huge. I find commenting out some blocks make the problem go away, but can't find any pattern. Stack unwinding isn't mandatory for longjmp(), so I thought the one in the static lib doesn't do it. But I just checked and it does. Have you tried setting the frame pointer to 0 after setjmp()? ((_JUMP_BUFFER *)&cpuState)->Frame = 0; – Yale Zhang Aug 23 '16 at 20:22
  • In my case the longjmp occurs in the VirtualBox code so to do this I'd had to patch and recompile it. Since their frontends work (and I checked, they use static runtime), maybe it is a combo of static runtime and some flags. Still please keep me informed, this info is very helpful. If you do eventually submit a bug to MS please put a link here. – Rudolfs Bundulis Aug 24 '16 at 08:11
0

Not sure if this helps, but i encountered similar problems (without VMs) with longjmp in combination with x64 Windows.

Turned out that any kind of aligned stack-data (aligned with >= 32byte) in the same scope like the longjmp causes longjmp to come up with the 0xC0000028 when compiled for x64.

#include <setjmpex.h>

void doThe_0xC0000028 ( )
{
    jmp_buf jp;

    if (!setjmp (jp))
    {
        // do some stuff ... 
        // ... then "revert" with longjmp.
        longjmp (jp, 1);
    }

    // having any aligned data on stack (align > 16) in the same scope
    // causes longjmp to go: 0xC0000028
    //------------------------------------------------------------------
    __declspec(align(32)) char buffer[12];

    // just accessing buffer somehow - this is apparently needed to generate the faulty 0xC0000028
    buffer[0];          
}

I reported that as MSVC bug: https://connect.microsoft.com/VisualStudio/feedback/details/3136150/64bit-longjmp-causing-0xc0000028-with-aligned-stack-data

Roman Pfneudl
  • 707
  • 1
  • 8
  • 21
0

One thing to be aware of if you are calling x64 MSVCRT’s setjmp through a foreign function interface is that it expects an undocumented second parameter: the stack pointer immediately before the call. This gets stored in the Frame member of the jmp_buf. If you do not explicitly pass this argument, the Frame just becomes whatever happens to be in RDX. longjmp calls RtlUnwindEx, which checks the target Frame looks like a valid stack pointer – and if it does not, it raises the STATUS_BAD_STACK exception.

Brian Nixon
  • 9,398
  • 1
  • 18
  • 24