I have a rather complex, but extremely well-tested assembly language x86-32 application running on variety of x86-32 and x86-64 boxes. This is a runtime system for a language compiler, so it supports the execution of another compiled binary program, the "object code".
It uses Windows SEH to catch various kinds of traps: division by zero, illegal access, ... and prints a register dump using the context information provided by Windows, that shows the state of the machine at the time of the trap. (It does lots of other stuff irrelevant to the question, such as printing a function backtrace or recovering from the division by zero as appropriate). This allows the writer of the "object code" to get some idea what went wrong with his program.
It behaves differently on two Windows 7-64 systems, that are more or less identical, on what I think is an illegal memory access. The specific problem is that the "object code" (not the well-tested runtime system) somewhere stupidly loads 0x82 into EIP; that is a nonexistent page in the address space AFAIK. I expect a Windows trap though the SEH, and expect to a register dump with EIP=00000082 etc.
On one system, I get exactly that register dump. I could show it here, but it doesn't add anything to my question. So, it is clear the SEH in my runtime system can catch this, and display the situation. This machine does not have any MS development tools on it.
On the other ("mystery") system, with the same exact binaries for runtime system and object code, all I get is the command prompt. No further output. FWIW, this machine has MS Visual Studio 2010 on it. The mystery machine is used heavily for other purposes, and shows no other funny behaviors in normal use.
I assume the behavior difference is caused by a Windows configuration somewhere, or something that Visual Studio controls. It isn't the DEP configuration the system menu; they are both configured (vanilla) as "DEP for standard system processes". And my runtime system executable has "No (/NXCOMPAT:NO)" configured.
Both machines are i7 but different chips, 4 cores, lots of memory, different motherboards. I don't think this is relevant; surely both of these CPUs take traps the same way.
The runtime system includes the following line on startup:
SetErrorMode(SEM_FAILCRITICALERRORS | SEM_NOGPFAULTERRORBOX); // stop Windows pop-up on crashes
This was recently added to prevent the "mystery" system from showing a pop-up window, "xxx.exe has stopped working" when the crash occurs. The pop-up box behaviour doesn't happen on the first system, so all this did was push the problem into a different corner on the "mystery" machine.
Any clue where I look to configure/control this?
I provide here the SEH code I am using. It has been edited to remove a considerable amount of sanity-checking code that I claim has no effect on the apparant state seen in this code.
The top level of the runtime system generates a set of worker threads (using CreateThread) and points to execute ASMGrabGranuleAndGo; each thread sets up its own SEH, and branches off to a work-stealing scheduler, RunReadyGranule. To the best of my knowledge, the SEH is not changed after that; at least, the runtime system and the "object code" do not do this, but I have no idea what the underlying (e.g, standard "C") libraries might do.
Further down I provide the trap handler, TopLevelEHFilter. Yes, its possible the register printing machinery itself blows up causing a second exception. I'll try to check into this again soon, but IIRC my last attempt to catch this in the debugger on the mystery machine, did not pass control to the debugger, just got me the pop up window.
public ASMGrabGranuleAndGo
ASSUME FS:NOTHING ; cancel any assumptions made for this register
ASMGrabGranuleAndGo:
;Purpose: Entry for threads as workers in PARLANSE runtime system.
; Each thread initializes as necessary, just once,
; It then goes and hunts for work in the GranulesQ
; and start executing a granule whenever one becomes available
; install top level exception handler
; Install handler for hardware exceptions
cmp gCompilerBreakpointSet, 0
jne HardwareEHinstall_end ; if set, do not install handler
push offset TopLevelEHFilter ; push new exception handler on Windows thread stack
mov eax, [TIB_SEH] ; expected to be empty
test eax, eax
BREAKPOINTIF jne
push eax ; save link to old exception handler
mov fs:[TIB_SEH], esp ; tell Windows that our exception handler is active for this thread
HardwareEHinstall_end:
;Initialize FPU to "empty"... all integer grains are configured like this
finit
fldcw RTSFPUStandardMode
lock sub gUnreadyProcessorCount, 1 ; signal that this thread has completed its initialization
@@: push 0 ; sleep for 0 ticks
call MySleep ; give up CPU (lets other threads run if we don't have enuf CPUs)
lea esp, [esp+4] ; pop arguments
mov eax, gUnreadyProcessorCount ; spin until all other threads have completed initialization
test eax, eax
jne @b
mov gThreadIsAlive[ecx], TRUE ; signal to scheduler that this thread now officially exists
jmp RunReadyGranule
ASMGrabGranuleAndGo_end:
;-------------------------------------------------------------------------------
TopLevelEHFilter: ; catch Windows Structured Exception Handling "trap"
; Invocation:
; call TopLevelEHFilter(&ReportRecord,&RegistrationRecord,&ContextRecord,&DispatcherRecord)
; The arguments are passed in the stack at an offset of 8 (<--NUMBER FROM MS DOCUMENT)
; ESP here "in the stack" being used by the code that caused the exception
; May be either grain stack or Windows thread stack
extern exit :proc
extern syscall @RTSC_PrintExceptionName@4:near ; FASTCALL
push ebp ; act as if this is a function entry
mov ebp, esp ; note: Context block is at offset ContextOffset[ebp]
IF_USING_WINDOWS_THREAD_STACK_GOTO unknown_exception, esp ; don't care what it is, we're dead
; *** otherwise, we must be using PARLANSE function grain stack space
; Compiler has ensured there's enough room, if the problem is a floating point trap
; If the problem is illegal memory reference, etc,
; there is no guarantee there is enough room, unless the application is compiled
; with -G ("large stacks to handle exception traps")
; check what kind of exception
mov eax, ExceptionRecordOffset[ebp]
mov eax, ExceptionRecord.ExceptionCode[eax]
cmp eax, _EXCEPTION_INTEGER_DIVIDE_BY_ZERO
je div_by_zero_exception
cmp eax, _EXCEPTION_FLOAT_DIVIDE_BY_ZERO
je float_div_by_zero_exception
jmp near ptr unknown_exception
float_div_by_zero_exception:
mov ebx, ContextOffset[ebp] ; ebx = context record
mov Context.FltStatusWord[ebx], CLEAR_FLOAT_EXCEPTIONS ; clear any floating point exceptions
mov Context.FltTagWord[ebx], -1 ; Marks all registers as empty
div_by_zero_exception: ; since RTS itself doesn't do division (that traps),
; if we get *here*, then we must be running a granule and EBX for granule points to GCB
mov ebx, ContextOffset[ebp] ; ebx = context record
mov ebx, Context.Rebx[ebx] ; grain EBX has to be set for AR Allocation routines
ALLOCATE_2TOK_BYTES 5 ; 5*4=20 bytes needed for the exception structure
mov ExceptionBufferT.cArgs[eax], 0
mov ExceptionBufferT.pException[eax], offset RTSDivideByZeroException ; copy ptr to exception
mov ebx, ContextOffset[ebp] ; ebx = context record
mov edx, Context.Reip[ebx]
mov Context.Redi[ebx], eax ; load exception into thread's edi
GET_GRANULE_TO ecx
; This is Windows SEH (Structured Exception Handler... see use of Context block below!
mov eax, edx
LOOKUP_EH_FROM_TABLE ; protected by DelayAbort
TRUST_JMP_INDIRECT_OK eax
mov Context.Reip[ebx], eax
mov eax, ExceptionContinueExecution ; signal to Windows: "return to caller" (we've revised the PC to go to Exception handler)
leave
ret
TopLevelEHFilter_end:
unknown_exception:
<print registers, etc. here>