Structured Exception Handler catches near-zero EIP trap differently on nearly identical machines?

Question

I have a rather complex, but extremely well-tested assembly language x86-32 application running on variety of x86-32 and x86-64 boxes. This is a runtime system for a language compiler, so it supports the execution of another compiled binary program, the "object code".

It uses Windows SEH to catch various kinds of traps: division by zero, illegal access, ... and prints a register dump using the context information provided by Windows, that shows the state of the machine at the time of the trap. (It does lots of other stuff irrelevant to the question, such as printing a function backtrace or recovering from the division by zero as appropriate). This allows the writer of the "object code" to get some idea what went wrong with his program.

It behaves differently on two Windows 7-64 systems, that are more or less identical, on what I think is an illegal memory access. The specific problem is that the "object code" (not the well-tested runtime system) somewhere stupidly loads 0x82 into EIP; that is a nonexistent page in the address space AFAIK. I expect a Windows trap though the SEH, and expect to a register dump with EIP=00000082 etc.

On one system, I get exactly that register dump. I could show it here, but it doesn't add anything to my question. So, it is clear the SEH in my runtime system can catch this, and display the situation. This machine does not have any MS development tools on it.

On the other ("mystery") system, with the same exact binaries for runtime system and object code, all I get is the command prompt. No further output. FWIW, this machine has MS Visual Studio 2010 on it. The mystery machine is used heavily for other purposes, and shows no other funny behaviors in normal use.

I assume the behavior difference is caused by a Windows configuration somewhere, or something that Visual Studio controls. It isn't the DEP configuration the system menu; they are both configured (vanilla) as "DEP for standard system processes". And my runtime system executable has "No (/NXCOMPAT:NO)" configured.

Both machines are i7 but different chips, 4 cores, lots of memory, different motherboards. I don't think this is relevant; surely both of these CPUs take traps the same way.

The runtime system includes the following line on startup:

SetErrorMode(SEM_FAILCRITICALERRORS | SEM_NOGPFAULTERRORBOX); // stop Windows pop-up on crashes

This was recently added to prevent the "mystery" system from showing a pop-up window, "xxx.exe has stopped working" when the crash occurs. The pop-up box behaviour doesn't happen on the first system, so all this did was push the problem into a different corner on the "mystery" machine.

Any clue where I look to configure/control this?

I provide here the SEH code I am using. It has been edited to remove a considerable amount of sanity-checking code that I claim has no effect on the apparant state seen in this code.

The top level of the runtime system generates a set of worker threads (using CreateThread) and points to execute ASMGrabGranuleAndGo; each thread sets up its own SEH, and branches off to a work-stealing scheduler, RunReadyGranule. To the best of my knowledge, the SEH is not changed after that; at least, the runtime system and the "object code" do not do this, but I have no idea what the underlying (e.g, standard "C") libraries might do.

Further down I provide the trap handler, TopLevelEHFilter. Yes, its possible the register printing machinery itself blows up causing a second exception. I'll try to check into this again soon, but IIRC my last attempt to catch this in the debugger on the mystery machine, did not pass control to the debugger, just got me the pop up window.

public ASMGrabGranuleAndGo
ASSUME FS:NOTHING ; cancel any assumptions made for this register
ASMGrabGranuleAndGo:
;Purpose: Entry for threads as workers in PARLANSE runtime system.
;   Each thread initializes as necessary, just once,
;   It then goes and hunts for work in the GranulesQ
;   and start executing a granule whenever one becomes available

; install top level exception handler

; Install handler for hardware exceptions
cmp        gCompilerBreakpointSet, 0
jne        HardwareEHinstall_end ; if set, do not install handler
push       offset TopLevelEHFilter ; push new exception handler on Windows thread stack
mov        eax, [TIB_SEH]    ; expected to be empty
test       eax, eax
BREAKPOINTIF jne
push       eax               ; save link to old exception handler
mov        fs:[TIB_SEH], esp ; tell Windows that our exception handler is active for this thread
HardwareEHinstall_end:

;Initialize FPU to "empty"... all integer grains are configured like this   
finit
fldcw      RTSFPUStandardMode

lock sub   gUnreadyProcessorCount, 1  ; signal that this thread has completed its initialization

@@: push       0                           ; sleep for 0 ticks
call       MySleep                     ; give up CPU (lets other threads run if we don't have enuf CPUs)
lea        esp, [esp+4]                ; pop arguments
mov        eax, gUnreadyProcessorCount ; spin until all other threads have completed initialization
test       eax, eax
jne        @b

mov        gThreadIsAlive[ecx], TRUE ; signal to scheduler that this thread now officially exists
jmp        RunReadyGranule    
ASMGrabGranuleAndGo_end:

;-------------------------------------------------------------------------------

TopLevelEHFilter: ; catch Windows Structured Exception Handling "trap"
; Invocation:
;   call  TopLevelEHFilter(&ReportRecord,&RegistrationRecord,&ContextRecord,&DispatcherRecord)
;         The arguments are passed in the stack at an offset of 8 (<--NUMBER FROM MS DOCUMENT)
;   ESP here "in the stack" being used by the code that caused the exception
;   May be either grain stack or Windows thread stack
extern exit :proc
extern syscall @RTSC_PrintExceptionName@4:near ; FASTCALL

push       ebp                     ; act as if this is a function entry
mov        ebp, esp                ; note: Context block is at offset ContextOffset[ebp]

IF_USING_WINDOWS_THREAD_STACK_GOTO unknown_exception, esp ; don't care what it is, we're dead
    ; *** otherwise, we must be using PARLANSE function grain stack space
    ; Compiler has ensured there's enough room, if the problem is a floating point trap
    ; If the problem is illegal memory reference, etc,
    ; there is no guarantee there is enough room, unless the application is compiled 
    ; with -G ("large stacks to handle exception traps")

; check what kind of exception 
mov        eax, ExceptionRecordOffset[ebp]
mov        eax, ExceptionRecord.ExceptionCode[eax]
cmp        eax, _EXCEPTION_INTEGER_DIVIDE_BY_ZERO
je         div_by_zero_exception
cmp        eax, _EXCEPTION_FLOAT_DIVIDE_BY_ZERO
je         float_div_by_zero_exception
jmp        near ptr unknown_exception  

float_div_by_zero_exception:
mov        ebx, ContextOffset[ebp] ; ebx = context record
mov        Context.FltStatusWord[ebx], CLEAR_FLOAT_EXCEPTIONS    ; clear any floating point exceptions
mov        Context.FltTagWord[ebx], -1 ; Marks all registers as empty
div_by_zero_exception: ; since RTS itself doesn't do division (that traps),
; if we get *here*, then we must be running a granule and EBX for granule points to GCB
mov        ebx, ContextOffset[ebp] ; ebx = context record

mov        ebx, Context.Rebx[ebx] ; grain EBX has to be set for AR Allocation routines
ALLOCATE_2TOK_BYTES 5             ; 5*4=20 bytes needed for the exception structure
mov        ExceptionBufferT.cArgs[eax], 0
mov        ExceptionBufferT.pException[eax], offset RTSDivideByZeroException    ; copy ptr to exception

mov        ebx, ContextOffset[ebp] ; ebx = context record
mov        edx, Context.Reip[ebx]
mov        Context.Redi[ebx], eax  ; load exception into thread's edi

GET_GRANULE_TO ecx

; This is Windows SEH (Structured Exception Handler... see use of Context block below! 

mov        eax, edx
LOOKUP_EH_FROM_TABLE   ; protected by DelayAbort
TRUST_JMP_INDIRECT_OK eax
mov        Context.Reip[ebx], eax

mov        eax, ExceptionContinueExecution ; signal to Windows: "return to caller" (we've revised the PC to go to Exception handler)
leave
ret

TopLevelEHFilter_end:

unknown_exception:
<print registers, etc. here>

Do you actually have anything mapped at 0x00000082 (it would be unusual)? If not, attempting to execute there is a regular access violation - not a DEP or NX error. — nobody, Sep 11 '14 at 21:53
Have you used a debugger to step through and see what's happening with the faulting code? It's entirely that - in addition to screwing up EIP - it's stomping your SEH handlers. — nobody, Sep 11 '14 at 21:58
@AndrewMedico: An NX error is a regular access violation. In `EXCEPTION_RECORD` docs you'll find the following note: "`EXCEPTION_ACCESS_VIOLATION` The first element of the array contains a read-write flag that indicates the type of operation that caused the access violation. If this value is zero, the thread attempted to read the inaccessible data. If this value is 1, the thread attempted to write to an inaccessible address. If this value is 8, the thread causes a user-mode data execution prevention (DEP) violation." — Ben Voigt, Sep 11 '14 at 21:58
@AndrewMedico No, my address space is vanilla; I take what Windows offers me. In particular, I make no attempt to force the allocation of pages anywhere; I'd expect Windows to stop me from allocation a page at VM address zero anyway, although I've never done experiment. OK, so I'm getting a regular access violation (let me go find that register dump, it actually tells me ....) — Ira Baxter, Sep 11 '14 at 22:03
@AndrewMedico: Hmm. I think I tried this, and IIRC on the mystery system although I could attach the process before it died in MSVC, when it did die, I got the pop-up window affect, not a transfer of control to the debugger. That's why I added the SetErrorMode". I'l go back and try this again. Even so, why would the behavior be different on different machines (yes, it may actually be that differences in memory layouts somehow cause this). — Ira Baxter, Sep 11 '14 at 22:06
I have to agree, it is a long rambling story that makes little sense. Doesn't have anything to do with DEP, this always generates a plain AV, SEH exception code 0xc0000005. We can't tell how it is trapped, no code to look at, there's more than one way. It mixes WER into the problem for mysterious reasons. The trivial explanation is that the customer's SEH handling got ahead of yours, nothing to demonstrate that wouldn't be the case. This question needs a lot of work to get it beyond the "it doesn't work" state it is in now. — Hans Passant, Sep 11 '14 at 22:36
@HansPassant: Thank you for your insight and your assumptions. There *isn* any "customer's SEH" other than the one in the runtime system I built. I could have supplied an acre of code but the question wasn't about that; I think I demonstrated the code was reasonably competent by virtue of producing the right thing on one machine. The question wasn't about my code; it is about *what configurations" does Windows use/inspect to control the effect of this code. I will insert that code if it makes you happy. I agree it wasn't about DEP; we sorted that out pretty fast. — Ira Baxter, Sep 11 '14 at 23:07
(Readers: the original formulation of this question had my hint the problem might be related to DEP. Other than being a kind of trap, it does not appear to be involved. I have revised the question accordingly). — Ira Baxter, Sep 13 '14 at 04:26
I am surprised at the apparant hostility of people to this problem. It it real enough. The fact that it is is hard to explain, is part of the problem. Downvoting the question isn't helping to arrive at a solution. — Ira Baxter, Feb 11 '15 at 01:11

Ben Voigt · Answer 1 · 2020-12-21T16:21:21.423

1

"DEP for standard system processes" won't help you; it's internally known as "OptIn". What you need is the IMAGE_DLLCHARACTERISTICS_NX_COMPAT flag set in the PE header of your .exe file. Or call the SetProcessDEPPolicy function in kernel32.dll The SetProcessMitigationPolicy would be good also... but it isn't available until Windows 8.

There's some nice explanation on Ed Maurer's blog, which explains both how .NET uses DEP (which you won't care about) but also the system rules (which you do).

BIOS settings can also affect whether hardware NX is available.

edited Dec 21 '20 at 16:21

answered Sep 11 '14 at 21:41

Ben Voigt

277,958
43
419
720

I'm using the same .exe files on both machines; they have this flag obviously set the same way by virtue of being identical. How can this end up different? (I'm going off to check what my MS DEV build configuation is...) – Ira Baxter Sep 11 '14 at 21:43
@IraBaxter: If you want DEP traps, you need to enable them. Right now you are opted out. Perhaps in your test environment which you say is also configured for opt-in, your program is inheriting opt-in from the development environment? (e.g. vshost process or something like that) – Ben Voigt Sep 11 '14 at 21:48
(Somebody dinged your answer, sheesh, not me, I'm pleased you are trying to help!). Well, OK, yes, maybe I have the diagnostic wrong, and it isn't DEP. EIP=0x82 is down in page which simply doesn't exist in the address space according to my understanding, so an "illegal address" trap ought to occur on both machines. Changing subject title correspondingly. Where is the "opt-in" information found? – Ira Baxter Sep 11 '14 at 21:53
@IraBaxter: Yeah you should get an access violation either way, although it might vary between (unable to execute) vs (unable to read). And I don't think DEP would eat access violations in favor of simply terminating the process, but it is supposed to be a security feature so it might. – Ben Voigt Sep 11 '14 at 21:55
@Ira: Of note: "The SetProcessDEPPolicy function overrides the system DEP policy for the current process unless its DEP policy was specified at process creation." How to set that policy at process creation I haven't yet discovered. But it does imply that system configuration isn't the only thing that matters, so does the `CreateProcess` call. – Ben Voigt Sep 11 '14 at 21:57
(I have revised my question to de-emphasize DEP). So I agree I should get an access violation either way. I also agree that a default security policy for Windows might decide DEP or PC in rediculous place would be enough to terminate a process. But I have two systems that behave *differently*. I conclude this is configurable somewhere. – Ira Baxter Sep 11 '14 at 21:59
@Ira: Well what do you get from `GetProcessDEPPolicy(GetCurrentProcess())` ? – Ben Voigt Sep 11 '14 at 22:01
@Ira: Ahh, here's how it can be affected from outside your process, apart from system configuration: "`PROCESS_CREATION_MITIGATION_POLICY_DEP_ENABLE (0x00000001)` Enables data execution prevention (DEP) for the child process." – Ben Voigt Sep 11 '14 at 22:04
Wow, that's a pretty crazy set of control knobs. Let me stare at that awhile. – Ira Baxter Sep 11 '14 at 22:14
The "Ed Maurer's blog" link is now dead :( Either way, thank you for the very informative reply. – Violet Giraffe Dec 21 '20 at 15:16
@VioletGiraffe: I was able to find an archived copy and fix the link. Thanks for bringing that to my attention. – Ben Voigt Dec 21 '20 at 16:21
Thank you! I didn't think of using web.archive for this, sorry, otherwise I'd fix it myself. A good tool to have indeed. – Violet Giraffe Dec 21 '20 at 21:36

Structured Exception Handler catches near-zero EIP trap differently on nearly identical machines?

1 Answers1