1

I've continued working on my program yesterday after a month long break. I didn't change anything on the code but now my application does not start anymore. At one point it just interrupts execution and seems to be stuck in a deadlock, though I'm not sure if it really is a deadlock since it happens when the method returns - at a point where it should normally not happen.

I can't show you the code since it's huge. But I can say surely that the only action beyond it's own thread is access to some UI elements which get invoked by the Dispatcher. And until yesterday everything worked fine, I haven't changed anything there.

This is the place where it happens:

    internal override Task InitializeAddIns()
    {
        try
        {
            Action action = () => this._addinProvider.InitializeAddins();
            Task t = Task.Factory.StartNew(action);
            return t;
        }
        catch (Exception ex)
        {
            Debugger.Break();
            return null;
        }
    }

Calling code:

// Initialize AddIns
splash.SplashText = "SplashScreen:step_searchAddIns".Translate();
this._addinSystem.InitializeAddIns();
splash.SplashText = "SplashScreen:step_startAddIns".Translate();
await Task.Run(() => this._addinSystem.RunAddins());

// Resolve libraries with NativeCompressor
splash.SplashText = "SplashScreen:step_resolveDependencies".Translate();

The task starts and returns 't'. The InitializeAddins()-method runs successfully to end (checked it with the debugger - logs also show that it finishes completely). The next step is that the declaration line of "action" is marked (as it finished). Then the debugging ends and nothing more happens. Not even this Dispatcher hook gets called:

Dispatcher.CurrentDispatcher.Hooks.DispatcherInactive += (sender, args) => this.Update();

My only assumption is that there's a deadlock somewhere. I can't explain otherwise why the whole execution stops and gets stuck. I just can't find any clue where to start searching. I reworked the newly introduced code and added some extended locking methods which also detect deadlocks. No deadlock detected so far.

Since I don't know what could cause the problem, I tried to use WinDbg and SOSEX to find the error source. Sadly I don't get WinDbg to run. It does check the Symbol server and the last outputs are the following:

CLRDLL: Unable to find mscordacwks_AMD64_x86_4.0.30319.34209.dll by mscorwks search CLRDLL: Unable to find 'SOS_AMD64_x86_4.0.30319.34209.dll' on the path Cannot Automatically load SOS CLRDLL: Loaded DLL mscordacwks_AMD64_x86_4.0.30319.34209.dll CLR DLL status: Loaded DLL mscordacwks_AMD64_x86_4.0.30319.34209.dll

Though it obviously loaded something, I get this message when calling SOSEX's !dlk command:

0:028> !dlk Unable to initialize .NET data interface. Version 4.0.30319.34209 of mscordacwks.dll is required. Locate and load the correct version of mscordacwks.dll. See documentation for the .cordll command. Examining CriticalSections... No deadlocks detected.

So I really don't know any further how to get this bug fixed. What could be reasons for this behaviour? I don't even get an exception. I already enabled CLR exceptions, but not even those get thrown. It's quite strange, I normally would expect that this lockdown does occur somewhere in the middle, not after the method exits...

SharpShade
  • 1,761
  • 2
  • 32
  • 45
  • Can you show us what happens inside `_addinProvider.InitializeAddins()`? – Yuval Itzchakov Nov 06 '14 at 13:14
  • It is a bit much code, around 1600 lines in total. It reads out a resource of all assemblies in the Addins-directory, extends the UI by parsing that file, creates an instance of the AddIn and returns after all are loaded. I will post the code if there's really no solution, but what would be sources for this behaviour in general, independent of which code is being executed? I mean this method runs to end without failure. Afterwards it stops. This doesn't happen if I do run it synchronously without the Task as I just noticed. But why shouldn't it work asynchronously? Deadlock? – SharpShade Nov 06 '14 at 13:25
  • It would be just nice if I could use WinDbg, but since it does load the DLL but still say it would be the wrong ... I can't find out whether there's really a deadlock -.- – SharpShade Nov 06 '14 at 13:25
  • Can you show the code calling `InitializeAddIns`? – Yuval Itzchakov Nov 06 '14 at 13:28
  • I added it in the initial post. As you see I tried a bit. Without await and Tasks the Initialization works. It reaches the third splash.SplashText line and then gets stuck again. Slowly I really believe it's a deadlock. – SharpShade Nov 06 '14 at 13:33
  • What's your symbol search path for WinDbg? – dsolimano Nov 06 '14 at 14:06
  • I read I just have to set this environment variable: _NT_SYMBOL_PATH @ srv*C:\Symbols*http://msdl.microsoft.com/download/symbols – SharpShade Nov 06 '14 at 14:17

3 Answers3

1

First step is to try and run the code synchronously without any tasks involved at all.

Second step is to check if you are awaiting correctly. For example, you are missing an await on the call to this._addinSystem.InitializeAddIns(). This means that the call to InitializeAddInsmay not finish before you call RunAddIns. You should add this:

await this._addinSystem.InitializeAddIns();

Finally, you may not be waiting correctly on the calling code. If for example you are awaiting on a voidreturning function, the call may deadlock:

// This may deadlock because you are awaiting a void returning function!
await MyMethod();

//...

public void MyMethod()
{
    await this._addinSystem.InitializeAddIns();
    await Task.Run(() => this._addinSystem.RunAddins());    
}
Jakob Christensen
  • 14,826
  • 2
  • 51
  • 81
  • Okay so if I changed the signature so that "MyMethod" returns bool, then it would surely not deadlock (as far as I implemented locking right). Correct? Btw: I missed the first await because I tried it running synchronous, and yes it worked. I may know which causes those problems. The only thing which gets called from these methods is the Splashscreen and that one is not yet threadsafe. I changed it, but since I need to dispatch I get deadlocks there (these actually get detected). As soon as I fixed that I can say if it was the Splashscreen. But that void thing is also interesting ... – SharpShade Nov 06 '14 at 14:20
  • UPDATE: Yes it was exactly that problem. The Splashscreen caused a not found deadlock. Now the UI window shows up again, but there it gets deadlocked again. I will start a major refactoring session in order to make all classes threadsafe which aren't already. – SharpShade Nov 06 '14 at 14:33
  • @SharpShade: MyMethod must return Task before you can await it. – Jakob Christensen Nov 06 '14 at 14:40
  • I know that. Previously the calling code created a task. I just tried around because I didn't know what caused this problem. But meanwhile I found out all. Thanks, though! – SharpShade Nov 06 '14 at 15:12
1

I found the source of this problem. It was my Splashscreen, a simple Window, which gets accessed by those methods in order to update the current status (which AddIn gets loaded and so on). This was absolutely not threadsafe (I wonder why it worked before...).

I changed it to the following code in all properties. Would be nice if one could check that code and confirm that it is not hacked or a bad approach, since it does look a bit tricky... But it works so far.

public string SplashText
{
    get
    {
        using (ThreadLock.Lock(_lock))
        {
            return _splashText;
        }
    }
    set
    {
        if (_dispatcher.CheckAccess())
        {
            _splashText = value;
            OnPropertyChanged();
            return;
        }
        _dispatcher.Invoke(() =>
        {
            _splashText = value;
            OnPropertyChanged();
        });
    }
}
SharpShade
  • 1,761
  • 2
  • 32
  • 45
  • 1
    What is the `ThreadLock` type? Why are you using that instead of a C# `lock` statement? As for the safety of the code, I always get a little nervous when I see a dispatcher `Invoke` and a thread synchronization lock at the same time. The synchronous `Invoke` is itself equivalent to a lock, so this is a deadlock opportunity. But as long as you are careful to only ever take that lock _inside_ a dispatcher `Invoke` call (i.e. always take the locks in the same order), it will be okay. It's just that it can be hard to make that guarantee sometimes. – Peter Duniho Nov 06 '14 at 18:13
  • ThreadLock is a custom implementation which utilizes the Monitor class as base. The ThreadLock just has some mechanics for tracking deadlocks and proofed to be essential for me. It throws a DeadlockException at for example exactly that place where I invoke. In the beginning I had the Lock around the Dispatcher which of course caused a deadlock. So Invoke automatically locks itself? So I can safely invoke those changes without having to worry that this could interrupt anyhow? Then I'll save the lock statement since it's just overhead. – SharpShade Nov 06 '14 at 18:56
  • 2
    If you only ever access the `_splashText` field on the GUI thread -- i.e. in code called directly by WPF in that thread, or in code you've explicitly dispatched onto that thread -- then yes...you don't need any other locking, since all of the accesses of that field will happen synchronously in that single thread. – Peter Duniho Nov 06 '14 at 18:58
  • Good, then I'll remove those redundant locks now. EDIT: Changed the code sample above, too. Is the Lock in the getter redundant as well? – SharpShade Nov 06 '14 at 19:43
1

Deadlock preconditions (why you have not seen the deadlock before)

There are 4 preconditions that must be met for a deadlock to occur. If one of them is missing, there won't be a deadlock. Those preconditions are:

  • Mutual Exclusion
  • No Preemption
  • Hold and Wait
  • Circular Wait

The last one could also be named "Timing". Since it depends on how Windows is assigning CPU time, you may live without deadlocks for years. It is more likely on multi core CPUs, because the circular wait is easier to achieve if two threads are really executed in parallel.

Your symbols (why you can't load SOSEX)

mscordacwks_AMD64_x86_4.0.30319.34209.dll is a file that does not exist. Please confess that you have renamed another file to that file name because you have seen WinDbg looking for it.

The name indicates that you are trying to debug a 32 bit application with a 64 bit debugger. Microsoft does not support this. You can only debug 64 bit .NET applications in 64 bit WinDbg and 32 bit .NET applications in 32 bit WinDbg (which also runs on 64 bis OS BTW).

If you have a 64 bit dump file only and can't reproduce the issue, you're unlucky. There is no way (I researched several times) to make it work and there is no way of converting the dump from 64 bit to 32 bit.

Solving the problem

Other than that, your approach of using SOSEX' !dlk is good. It should detect deadlocks caused by C# lock statements.

I do not agree to make the code run synchronous as proposed in the answer of Jakob Christensen. While you could do that in a small application, this would require too much rewriting in a larger application.

Switching to synchronous execution and back to asynchronous may lead to an undetected situation again (timing may have changed and it just became less likely to cause a deadlock).

IMHO it's better to really understand the deadlock (which needs some understanding of the Windows Internals, so you might want to read the book). Once you understand Windows Threading, you also better understand async and await.

Then I agree to Peter Duniho who said:

If you only ever access the _splashText field on the GUI thread -- i.e. in code called directly by WPF in that thread, or in code you've explicitly dispatched onto that thread -- then yes...you don't need any other locking, since all of the accesses of that field will happen synchronously in that single thread.

Note that there is not only "the GUI thread". I see more and more developers creating several UI threads, i.e. threads which have their own message queue. I hope you have only one.

Community
  • 1
  • 1
Thomas Weller
  • 55,411
  • 20
  • 125
  • 222
  • I know those conditions very well, and sure, the Timing would be a good explanation for this occurence. About the symbols: Yes I did, but just because more than one tutorials said you should take the corresponding .Net DLL of it, rename it and place it in WinDbg's directory. Indeed I was debugging with the 64bit WinDbg and my application runs currently in x86 - I was already sceptical about that, but x86 WinDbg didn't show me the threads (some warning about the system architecture or so) and 64 did - I thought it may depends on the system then which to use. I will take a look at this book. – SharpShade Nov 07 '14 at 09:26
  • I wouldn't even want to change it. It's working fine this way and must be asynchronous because I want to keep the user updated about the loading process. Sure, only one GUI thread. Wouldn't even know why I should use another one. – SharpShade Nov 07 '14 at 09:27