3

I have a .net application that I developed on a Windows 8.1 machine using Visual Studio Express 2008 compiled for .Net 4.0

It runs fine on the Windows 8.1 machine, but on a (very) old single core XP machine it occasionally throws an AccessViolationException, and I cannot figure out why.

Running inside Visual Studio in debug mode, I get nothing helpful.

The program is very parallel and I am using the TPL.

The Event log shows this (which means nothing to me):

Stack:
    at System.Windows.Forms.UnsafeNativeMethods.DispatchMessageW(MSG ByRef)
    at System.Windows.Forms.Application+ComponentManager.System.Windows.Forms.UnsafeNativeMethods.IMsoComponentManager.FPushMessageLoop(IntPtr, 
Int32, Int32)
    at System.Windows.Forms.Application+ThreadContext.RunMessageLoopInner(Int32, System.Windows.Forms.ApplicationContext)
    at System.Windows.Forms.Application+ThreadContext.RunMessageLoop(Int32, 
System.Windows.Forms.ApplicationContext)
    at Microsoft.VisualBasic.ApplicationServices.WindowsFormsApplicationBase.OnRun()
    at Microsoft.VisualBasic.ApplicationServices.WindowsFormsApplicationBase.DoApplicationModel()
    at Microsoft.VisualBasic.ApplicationServices.WindowsFormsApplicationBase.Run(System.String[]) 

The only libraries outside of the standard .net stuff I'm using are System.data.SQLite and Newtonsoft.JSON

The application is using the JSON to access an RPC-Post API.

Any ideas what bit of my code might be causing this? Like I say it only happens on the old XP machine, but it could be a race condition I am only seeing because it is much slower. I don't even know where to start!

Corvus
  • 7,548
  • 9
  • 42
  • 68
  • Kindly post some code snippet so people can look and sugget – hellowahab Mar 23 '15 at 17:11
  • @hellowahab I can't. As I said in the question I don't know what code causes the error. If I could posta code snippet I could answer my own question. I need hints on where to look. I was hoping someone could decode the event log. – Corvus Mar 23 '15 at 17:44
  • To me your problem seems somewhat connected to Task Parallel Library running on XP. – hellowahab Mar 23 '15 at 19:24
  • Do you have XP with Updates ? – hellowahab Mar 23 '15 at 19:25
  • @hellowahab Yes it could be tpl - anything specific you had in mind? – Corvus Mar 23 '15 at 21:28
  • 1
    If I had to guess, I'd say something in a handler isn't properly synchronized and is causing an access violation. Note that you get a managed exception -- ie you're at the IL level. It's not a bug in the framework itself most likely. – Blindy Mar 23 '15 at 21:35
  • If you're using tpl you could try attaching exception handlers to each task. Perhaps you'd then get more details about which task it's coming from. – Mike Parkhill Mar 25 '15 at 23:42
  • @MikeParkhill if you mean attach a function with `task.continueWith` for `TaskContinuationOptions.OnlyOnFaulted` the all task already have that. That code does thread safe logging of every error, and it isn't triggering. Either I cannot catch AccessViolation this way, or the error is coming from a different bit of the framework? – Corvus Mar 26 '15 at 09:07
  • 1
    Yeah, that's what I meant. A lot of times these exceptions come from trying to interact with the ui from a background thread. Have you looked for that? – Mike Parkhill Mar 26 '15 at 11:37
  • @MikeParkhill wouldn't you expect to get an InvalidCrossThreadCall exception then? I'm not entirely sure HOW I look for it on a background thread - nor what sort of thread would cause this and not leave me in any user code. The full stack trace above means nothing to me. All of the routines in the trace appear to be framework. Anyone know what RunMessageLoopInner is? This seems to be where it enters unsafe code. – Corvus Mar 26 '15 at 14:12

2 Answers2

6

I'll noodle about this problem for a bit, it is pretty important to realize that you cannot get an answer with the info you posted. I can only talk about what you need to do to discover more information about this crash.

Most important detail is that the crash did not occur in the DispatchMessageW() method. There are a large number of stack frames on top of the trace you posted, you however cannot see them. Because they belong to unmanaged code, the CLR only records trace information for managed code. DispatchMessage() is a work-horse winapi function that does many different things in different cases, its primary job is to call the window procedure of a window. Which is the code that handles a specific message for the window.

What is clear from the trace is that the crash was not caused by any .NET code. Which is expected, .NET is very good at avoiding AccessViolationExceptions. There are a few controls that you could use on your form that could be responsible. On top of that list are ActiveX controls, WebBrowser, the shell dialogs like OpenFileDialog. All controls that are implemented in native code and have a very thin .NET wrapper to make them usable in a .NET project. They are normally pretty well behaved. But then this is an old machine that has been subjected to who-knows-what, such machines tend to be infected pretty badly with all kinds of "helpful" software that injects itself into any process and probably hasn't been maintained in a long time.

You mention "very parallel", that tends to be a red flag. No strong signal, the crash occurs on the UI thread of the program, not a worker thread. But it doesn't exclude it, you could be running code on a worker that does something with the window in an illegal way and destabilizes it. Causing a subsequent crash. If you've been faithfully using the debugger without intentionally suppressing InvalidOperationException and don't create any windows on a worker thread then this isn't a strong lead.

To get down to the root cause, you need to use an unmanaged debugger so you can see exactly where the crash occurs. That tends to be rough in more than one way, like not being to get to the machine when it bombs. In which case you need to ask the user to create a minidump of the crashed process. XP makes this painful as well, it isn't a built-in feature and you'll have a hard time using the minidump if your machine doesn't boot XP. SysInternals' ProcDump utility is useful to record one.

Once you receive one from the customer, you'll need to open it in a debugger and inspect it to find the reason for the crash. That's going to be rough if you can't make sense of the stack trace you see now, be sure to ask for help from team members that know more about the Windows internals. Google "how to debug a minidump" to learn more, the minimal MSDN how-to page is here.

All and all, do not expect miracles here, this is going to take at least a month out of your life climbing several steep learning curves and no guarantee for success. Which inspires the secondary approach, if your app is stable on any modern Windows version but not on one or two XP machines then this, arguably, stops being your problem. Time for the user to update his machines. Good luck with it.

Hans Passant
  • 922,412
  • 146
  • 1,693
  • 2,536
  • Thank you so much for this answer, it contains many helpful things (shame you community wiki'd it: it is worth some bounty). I never expected to get an answer to the bug itself, I just needed advice on how to think about tackling it. That said, I am starting to be convinced by your argument that supporting XP is just too hard, and since I can in principle force an upgrade, that might be easier. The application does have a WebBrowser control, and some pages that can be displayed in that may/are running scripts... – Corvus Mar 28 '15 at 21:26
  • I've accepted this answer, as I think it has the most general advice about dealing with this sort of error, but I gave the bounty to the next answer, since this is CW, and the next answer addresses the likely specific error in my case. – Corvus Apr 02 '15 at 08:21
0

There are some potentially useful answers here: C# WebBrowser Control System.AccessViolationException

None have been accepted, but that is because the OP is no longer working on the affected system. The ones with the most upvotes, ignoring the one that the OP discounted as irrelevant, are:

  • Re-register the jscript.dll library
  • Surround any code blocks that access the web browser control with mutex locks (this one even mentions that it was in response to the exception on Windows XP 32-bit only)
Community
  • 1
  • 1
matthewk
  • 1,841
  • 17
  • 31