Exceptions on unmanaged threads in .NET

Question

How do I handle situations when the my app is terminating, using a callback prior to termination?

The .NET handlers do not work in the following scenario, is SetUnhandledExceptionHandler the correct choice? It appears to have the shortcomings discussed in the following.

Scenario

I want to respond to all cases of app termination with a message and error report to our service in our .net app.

However, I have a WPF app in which two of our testers get unhandled exceptions that bypass:

AppDomain.UnhandledException (most importantly)
Application.ThreadException
Dispatcher.UnhandledException

They are marked SecuirtyCritical and HandleProcessCorruptedStateExceptions. legacyCorruptedStateExceptionsPolicy is set to true in the app.config

My two examples in the wild

VirtualBox running widows10 throws inside some vboxd3d.dll when initialising WPF somewhere (turning off vbox 3d accel "fixes it")
Win8 machine with suspicious option to "run on graphics card A/B" in system context menu, crashes somewhere (:/) during WPF startup but only when anti-cracking tools are applied.

Either way, when live, the app must to respond to these kinds of failures prior to termination.

I can reproduce this with an unmanaged exception, that occurs in an unmanaged thread of a PInvoked method in .net:

test.dll

BOOL APIENTRY DllMain( HMODULE hModule,
                       DWORD  ul_reason_for_call,
                       LPVOID lpReserved
                     )
{
    switch (ul_reason_for_call)
    {
    case DLL_PROCESS_ATTACH:
    case DLL_THREAD_ATTACH:
    case DLL_THREAD_DETACH:
    case DLL_PROCESS_DETACH:
        break;
    }
    return TRUE;
}

DWORD WINAPI myThread(LPVOID lpParameter)
{
    long testfail = *(long*)(-9022);
    return 1;
}

extern "C" __declspec(dllexport) void test()
{
    DWORD tid;
    HANDLE myHandle = CreateThread(0, 0, myThread, NULL, 0, &tid);
    WaitForSingleObject(myHandle, INFINITE);
}

app.exe

class TestApp
{
    [DllImport("kernel32.dll")]
    static extern FilterDelegate SetUnhandledExceptionFilter(FilterDelegate lpTopLevelExceptionFilter);

    [UnmanagedFunctionPointer(CallingConvention.StdCall)]
    delegate int FilterDelegate(IntPtr exception_pointers);

    static int Win32Handler(IntPtr nope)
    {
        MessageBox.Show("Native uncaught SEH exception"); // show + report or whatever
        Environment.Exit(-1); // exit and avoid WER etc
        return 1; // thats EXCEPTION_EXECUTE_HANDLER, although this wont be called due to the previous line
    }

    [DllImport("test.dll")]
    static extern void test();

    [STAThread]
    public static void Main(string[] args)
    {
        AppDomain.CurrentDomain.UnhandledException += new UnhandledExceptionEventHandler(CurrentDomain_UnhandledException);
        SetUnhandledExceptionFilter(Win32Handler);
        test(); // This is caught by Win32Handler, not CurrentDomain_UnhandledException
    }
    [SecurityCritical, HandleProcessCorruptedStateExceptions ]
    static void CurrentDomain_UnhandledException(object sender, UnhandledExceptionEventArgs e)
    {
        Exception ex = e.ExceptionObject as Exception;
        MessageBox.Show(ex.ToString()); // show + report or whatever
        Environment.Exit(-1); // exit and avoid WER etc
    }
}

This handles the failure in the vboxd3d.dll in a bare WPF test app, which of course also has the WCF Dispatcher and WinForms Application (why not) exception handlers registered.

Updates

In the production code I am trying to use this on, the handler appears to get overwritten by some other caller, I can get around that by calling the method every 100ms which is stupid of course.
- On the machine with the vbox3d.dll problem, doing the above replaces the exception with one in clr.dll.
- It appears at the time of crash, the managed function pointer passed into kernel32 is no longer valid. Setting the handler with a native helper dll, which calls a native function inside appears to be working. The managed function is a static method - I'm not sure pinning applies here, perhaps the clr is in the process of terminating...
- Indeed the managed delegate was being collected. No "overwriting" of the handler was occuring. I've added as an answer..not sure what to accept or what the SO convention is here...

This makes little sense. You don't "start a method" with CreateProcess(). That creates a process, any exceptions inside that process are unobservable to code in your process. — Hans Passant, Jun 17 '15 at 18:19
What are you interested in the stack of the unmanaged exception, if Yes, then why not use Windbg, it will give you the stack trace of the Win32 threads. Infact windows debugging tools do a great job — Mrinal Kamboj, Jun 17 '15 at 18:26
Also in case you have the relevant un-managed code along with pdb files, then enable the relevant exception category in VS - Debug - Exceptions, it contains Win32 exceptions and couple of other categories — Mrinal Kamboj, Jun 17 '15 at 18:28
@MrinalKamboj this is for client side hanling of crashes, I've edited the question, does it make more sense now? — David Higgins, Jun 18 '15 at 10:35
@HansPassant Sorry, I meant createthread of course! Clearly not enough coffee. The question has been improved now (I hope)? — David Higgins, Jun 18 '15 at 10:39
@DavidHeffernan Fair enough, the problem, really, is about reporting client crashes. I've edited the question let me know if you still cant tell what my problem is! — David Higgins, Jun 18 '15 at 10:41

score 2 · Answer 1 · answered Jun 22 '15 at 10:54

The problem with the code in the question was this:

SetUnhandledExceptionFilter(Win32Handler);

Which since a delegate is automatically created, is eqivilant to:

FilterDelegate del = new FilterDelegate(Win32Handler);
SetUnhandledExceptionFilter(del);

Problem being, that the GC can collect it, and the native->managed thunk that is created, at any point after it's final reference. So:

SetUnhandledExceptionFilter(Win32Handler);
GC.Collect();
native_crash_on_unmanaged_thread();

Will always cause a nasty crash where the handler passed into kernel32.dll is no longer a valid function pointer. This is remedied by not allowing the GC to collect:

public class Program
{
    static FilterDelegate mdel;
    public static void Main(string[] args)
    {
        FilterDelegate del = new FilterDelegate(Win32Handler);
        SetUnhandledExceptionFilter(del);
        GC.KeepAlive(del);  // do not collect "del" in this scope (main)
        // You could also use mdel, which I dont believe is collected either
        GC.Collect();
        native_crash_on_unmanaged_thread(); 
    }
}

The other answers are also a great resource; not sure what to mark as the answer right now.

of course, the disadvantage of this entire approach, is that some shoddy code you call could subsequently replace and ignore your handler. There is no way to protect against that. The other answers here are better in that respect. — David Higgins, Jun 22 '15 at 12:29

score 0 · Answer 2 · answered Jun 19 '15 at 14:53

I've had to deal with, shall we say, unpredictable unmanaged libraries.

If you're P/Invoking into the unmanaged code, you may have problems there. I've found it easier to use C++/CLI wrappers around the unmanaged code and in some cases, I've written another set of unmanaged C++ wrappers around the library before getting to the C++/CLI.

You might be thinking, "why on earth would you write two sets of wrappers?"

The first is that if you isolate the unmanaged code, it makes it easier to trap exceptions and make them more palatable.

The second is purely pragmatic - if you have a library (not a dll) which uses stl, you will find that the link will magically give all code, managed and unmanaged, CLI implementation of the stl functions. The easiest way to prevent that is to completely isolate the code that uses stl, which means that everytime you access a data structure through stl in unmanaged code you end up doing multiple transitions between managed and unmanaged code and your performance will tank. You might think to yourself, "I'm a scrupulous programmer - I'll be super careful to put #pragma managed and/or #pragma unmanaged wrappers in the right places and I'm all set." Nope, nope, and nope. Not only is this difficult and unreliable, when (not if) you fail to do it properly, you won't have a good way to detect it.

And as always, you should ensure that whatever wrappers you write are chunky rather than chatty.

Here is a typical chunk of unmanaged code to deal with an unstable library:

try {
    // a bunch of set up code that you don't need to
    // see reduced to this:
    SomeImageType *outImage = GetImage();
    // I was having problems with the heap getting mangled
    // so heapcheck() is conditional macro that calls [_heapchk()][1]
    heapcheck();
    return outImage;
}
catch (std::bad_alloc &) {
    throw MyLib::MyLibNoMemory();
}
catch (MyLib::MyLibFailure &err)
{
    throw err;
}
catch (const char* msg)
{
    // seriously, some code throws a string.
    throw msg;
}
catch (...) {
    throw MyLib::MyLibFailure(MyKib::MyFailureReason::kUnknown2);
}

This is a good solution for PInvoking unmanaged code. You will call me crazy, the unhandled exceptions we are getting come from within the framework; I'll come back with some call stacks - I mention near top of question, will clarify. — David Higgins, Jun 22 '15 at 08:48
Also, I think you could use a catch(int code) {} too + don't forget /EHa on the linker for the SEH exceptions inside try/catch. — David Higgins, Jun 22 '15 at 08:50

score -1 · Answer 3 · answered Jun 22 '15 at 08:59

-1

An exception that can't be handled properly can always happen, and the process may die unexpectedly no matter how hard you try to protect it from within. However, you can monitor it from the outside.

Have another process that monitors your main process. If the main process suddenly disappears without logging an error or reporting things gracefully, the second process can do that. The second process can be a lot simpler, with no unmanaged calls at all, so chances of it disappearing all of a sudden are significantly smaller.

And as a last resort, when your processes start check if they've shut down properly. If not, you can report a bad shutdown then. This will be useful if the entire machine dies.

answered Jun 22 '15 at 08:59

zmbq

38,013
14
101
171

A good safety net, this. Won't help debugging the crash of course - but you can rollback the release, in which there must be a maddening chunk of code somewhere. – David Higgins Jun 22 '15 at 09:15
If you want to debug, let the process crash and produce a dump. Then you can see what happened. – zmbq Jun 22 '15 at 09:19
If you have access to the dump. It's possible to get it from clients via WER and the microsoft metadata exchange. I even wrote a small hack to automate product mapping submission to microsoft (https://github.com/flavourous/Mexer) a while ago to be used with CI. Although WER is not a resource you would want to use frequently or primarily - these cases should hopefully be rare. – David Higgins Jun 22 '15 at 11:02

Exceptions on unmanaged threads in .NET

Scenario

My two examples in the wild

Updates

3 Answers3