1

Please read the questions before marking as duplicate. This question implements the solution provided in this question and still experiences a deadlock.

I am debugging a large multi-threaded application that makes many concurrent calls to various Google APIs using the .Net client libraries and we are occasionally experiencing deadlocks during some requests. Some information about the application:

  • Application is a Windows service (SynchronizationContext is null)
  • .Net Framework version is 4.5
  • All threads making API requests are not in the default thread pool.

Specifically, we are calling the Execute() method which uses an async method and blocks while waiting for the result

public TResponse Execute()
{
    try
    {
        using (var response = ExecuteUnparsedAsync(CancellationToken.None).Result)
        {
            return ParseResponse(response).Result;
        }
    }
    ...
}

This in turn calls ExecuteUnparsedAsync() which executes HttpClient.SendAsync()

private async Task<HttpResponseMessage> ExecuteUnparsedAsync(CancellationToken cancellationToken)
{
    using (var request = CreateRequest())
    {
        return await service.HttpClient.SendAsync(request, cancellationToken).ConfigureAwait(false);
    }
}

Now, I understand that there are many ways we can experience deadlocks here, and that it would be best to change the application to use async methods instead to avoid them. Unfortunately, that would be a significant time investment that is not possible at this moment but may be in the future.

My specific issue is that all calling threads are not in the thread pool and because ConfigureAwait(false) is being called I expect that the continuations would always be run in the thread pool, but that is not the case. Instead it appears that the continuations are being scheduled on the original calling thread and the thread deadlocks because it is waiting on the result.

Using the following MCVE I am able to produce a deadlock within a few hours.

using Google.Apis.Auth.OAuth2;
using Google.Apis.Drive.v2;
using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.IO;
using System.Linq;
using System.Net;
using System.Text;
using System.Threading;
using System.Threading.Tasks;

namespace DeadlockTest
{
    class Program
    {
        const int NUM_THREADS = 70;

        static long[] s_lastExecute = new long[NUM_THREADS];
        static long count = 0;

        static void Main(string[] args)
        {
            ServicePointManager.DefaultConnectionLimit = 50;

            for(int i = 0; i < s_lastExecute.Length; i++)
            {
                s_lastExecute[i] = DateTime.Now.ToBinary();
            }

            Thread deadlockCheck = new Thread(new ThreadStart(CheckForDeadlock));
            deadlockCheck.Start();

            RunThreads();

            deadlockCheck.Join();
        }

        static void RunThreads()
        {
            List<Thread> threads = new List<Thread>();

            for (int i = 0; i < NUM_THREADS; i++)
            {
                int threadIndex = i;
                Thread thread = new Thread(
                    new ParameterizedThreadStart(BeginThread));
                thread.Start(threadIndex);
                threads.Add(thread);
            }

            foreach(var thread in threads)
            {
                thread.Join();
            }
        }

        static void BeginThread(object threadIndex)
        {
            Debug.Assert(SynchronizationContext.Current == null);
            Debug.Assert(Thread.CurrentThread.IsThreadPoolThread == false);
            ThreadLoop((int)threadIndex);
        }

        static void ThreadLoop(int threadIndex)
        {
            Random random = new Random(threadIndex);

            while (true)
            {
                try
                {
                    GoogleDrive.Test(random);
                }
                catch(Exception ex)
                {
                    Console.WriteLine(ex.Message);
                }

                Interlocked.Exchange(ref s_lastExecute[threadIndex], DateTime.Now.ToBinary());
                Interlocked.Increment(ref count);
            }
        }

        private static void CheckForDeadlock()
        {
            Console.WriteLine("Deadlock check started");

            TimeSpan period = TimeSpan.FromMinutes(1);
            TimeSpan deadlockThreshold = TimeSpan.FromMinutes(10);

            while (true)
            {
                Thread.Sleep((int)period.TotalMilliseconds);

                DateTime now = DateTime.Now;
                TimeSpan oldestUpdate = TimeSpan.MinValue;

                for (int i = 0; i < NUM_THREADS; i++)
                {
                    DateTime lastExecute = DateTime.FromBinary(
                        Interlocked.Read(ref s_lastExecute[i]));

                    TimeSpan delta = now - lastExecute;
                    if(delta > oldestUpdate)
                    {
                        oldestUpdate = delta;
                    }

                    if (delta > deadlockThreshold)
                    {
                        var msg = string.Format("Deadlock detected in thread {0} for {1} minutes",
                            i.ToString(), (now - lastExecute).TotalMinutes);
                        Console.WriteLine(msg);
                    }
                }

                int workerThreads, completionPortThreads;
                System.Threading.ThreadPool.GetAvailableThreads(out workerThreads, out completionPortThreads);

                Console.WriteLine("Checked for deadlocks.");
                Console.WriteLine("\tWorker threads: " + workerThreads.ToString());
                Console.WriteLine("\tCompletion port threads: " + completionPortThreads.ToString());
                Console.WriteLine("\tExecute calls: " + Interlocked.Read(ref count).ToString());
                Console.WriteLine("\tOldest update (minutes): " + oldestUpdate.TotalMinutes.ToString());
            }
        }
    }

    class GoogleDrive
    {
        const string SERVICE_ACCOUNT = @"<path_to_service_account>";

        static string[] SCOPES = { DriveService.Scope.Drive };

        public static DriveService GetDriveService(string user)
        {
            GoogleCredential credential;
            using (var stream = new FileStream(SERVICE_ACCOUNT, FileMode.Open, FileAccess.Read))
            {
                credential = GoogleCredential
                    .FromStream(stream)
                    .CreateScoped(SCOPES)
                    .CreateWithUser(user);
            }

            var service = new DriveService(new DriveService.Initializer()
            {
                HttpClientInitializer = credential
            });

            return service;
        }

        public static void Test(Random random)
        {
            int userIndex = random.Next(Users.USERS.Length);
            string user = Users.USERS[userIndex];
            using (DriveService service = GetDriveService(user))
            {
                var request = service.Files.List();
                var result = request.Execute();
            }
        }
    }

    public static class Users
    {
        public static string[] USERS = new string[]
        {
            "user0000@domain.com",
            "user0001@domain.com",
            ...
        };
    }
}

Running this test overnight gave me the following:

Deadlock detected in thread 15 for 274.216744496667 minutes
Deadlock detected in thread 45 for 154.73506413 minutes
Deadlock detected in thread 46 for 844.978023301667 minutes
Checked for deadlocks.
        Worker threads: 2045
        Completion port threads: 989
        Execute calls: 2153228
        Oldest update (minutes): 844.978023301667

Once the deadlock is detected I can insert a break in the thread loop and let the running threads exit. This leaves me with the main thread, the timer thread, two threads used by the runtime and my deadlocked threads (three in this case. Also note that thread IDs don't match up because I wasn't smart enough to use the actual thread IDs):

Process threads

Each thread has the following call stack:

    mscorlib.dll!System.Threading.Monitor.Wait(object obj, int millisecondsTimeout, bool exitContext)
    mscorlib.dll!System.Threading.Monitor.Wait(object obj, int millisecondsTimeout)
    mscorlib.dll!System.Threading.ManualResetEventSlim.Wait(int millisecondsTimeout, System.Threading.CancellationToken cancellationToken)
    mscorlib.dll!System.Threading.Tasks.Task.SpinThenBlockingWait(int millisecondsTimeout, System.Threading.CancellationToken cancellationToken)
    mscorlib.dll!System.Threading.Tasks.Task.InternalWait(int millisecondsTimeout, System.Threading.CancellationToken cancellationToken)
    mscorlib.dll!System.Threading.Tasks.Task<System.Net.Http.HttpResponseMessage>.GetResultCore(bool waitCompletionNotification)
    mscorlib.dll!System.Threading.Tasks.Task<System.__Canon>.Result.get()
    Google.Apis.dll!Google.Apis.Requests.ClientServiceRequest<Google.Apis.Drive.v2.Data.FileList>.Execute()
>   DeadlockTest.exe!DeadlockTest.GoogleDrive.Test(System.Random random)
    DeadlockTest.exe!DeadlockTest.Program.ThreadLoop(int threadIndex)
    DeadlockTest.exe!DeadlockTest.Program.BeginThread(object threadIndex)
    mscorlib.dll!System.Threading.ThreadHelper.ThreadStart_Context(object state)
    mscorlib.dll!System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext executionContext, System.Threading.ContextCallback callback, object state, bool preserveSyncCtx)
    mscorlib.dll!System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext executionContext, System.Threading.ContextCallback callback, object state, bool preserveSyncCtx)
    mscorlib.dll!System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext executionContext, System.Threading.ContextCallback callback, object state)
    mscorlib.dll!System.Threading.ThreadHelper.ThreadStart(object obj)

This also leaves me with the following remaining tasks:

Scheduled Tasks

I do not know of a way to find out exactly which thread a task is scheduled on, but I think it's obvious that they are scheduled for the deadlocked threads (i.e. they are the source of the deadlock).

This bring me to my questions:

  • Why are the continuation tasks not being scheduled in the thread pool instead of the calling thread?
  • Is there a way to force the tasks to run on the thread pool, but not have a thread pool thread block on the result?
AustinWBryan
  • 3,249
  • 3
  • 24
  • 42
Alden
  • 2,229
  • 1
  • 15
  • 21
  • `.Result` is always unsafe and, IMO, should be deprecated. The only reason people use it these days (in my experience) is to try to bridge async/sync and it just plain will always open things up to deadlocks. I know you want to postpone the asyncopolypse, but really, it's the only things that's going to make your code *solid*. Consider the time you've already wasted littering your code with `ConfigureAwait` calls that are still prone to failure as long as you're trying to do this bridging act. – Damien_The_Unbeliever Aug 23 '18 at 18:11
  • @Damien_The_Unbeliever Did you follow the link to that Execute() method? Shockingly that's Google's code. I think I'd literally report this as a bug in their library; someone clearly didn't run that one by John Skeet! – Todd Menier Aug 24 '18 at 12:48
  • @Alden: I don't know what the issue is, but perhaps you can try wrapping those async calls in `Task.Run(…).Result` and force everything to run on the thread pool. – Michael Liu Aug 24 '18 at 14:52

1 Answers1

0

I'm going to assume the real goal here is to eliminate the deadlocks, and to that end I don't believe that what threads your continuations are running on is even relevant. I see 2 issues:

  1. Google's Execute() method is just plain wrong. I would even consider reporting a bug, because HttpClient does not support synchronous calls, and blocking on calls using .Result is an invitation for deadlocks, period. There is no way around that.

  2. You're probably compounding the likelihood of deadlocks by forcing new threads into the the mix. A common misconception is that concurrency requires threads, and in the case of I/O bound work that is not true. Waiting for results of I/O (what your code is likely spending most of its time doing) requires no thread at all. The underlying subsystems are complex, but they're elegantly abstracted by asynchronous Task-returning APIs like HttpClient. Use Tasks instead of threads and those subsystems will decide when you need a new thread and when you don't.

All of this leads to a conclusion you were hoping to avoid - go asynchronous in your code. You say that's not feasible right now, so as much as I hate to suggest code that's not 100% correct, hopefully a fair compromise is to get part-way there in exchange for a dramatic decrease in the likelihood of deadlocks.

If feasible, my suggestion would be to refactor your thread-creating method (RunThreads in your MCVE) to use Tasks (let's call it RunTasks instead), and convert everything down the call stack to async, ending at a call to Google's ExecuteAsync instead of Execute. Here's how the important bits would look:

static void RunTasks()
{
    List<Task> tasks = new List<Task>();

    for (int i = 0; i < NUM_TASKS; i++)
    {
        tasks.Add(BeginTaskAsync(i));
    }

    // Your long-term goal should be to replace this with await Task.WhenAll(tasks);
    Task.WaitAll(tasks);
}

// work down your call stack here and convert methods to use async/await,
// eventually calling await ExecuteAsync() from the Google lib...

static async Task BeginTaskAsync(int taskIndex)
{
    ...
    await ThreadLoopAsync(taskIndex);
}

static async Task ThreadLoopAsync(int taskIndex)
{
    Random random = new Random(taskIndex);

    while (true)
    {
        try
        {
            await GoogleDrive.TestAsync(random);
        }
        ...
    }
}

class GoogleDrive
{
    ...

    public static async Task TestAsync(Random random)
    {
        ...
        using (DriveService service = GetDriveService(user))
        {
            var request = service.Files.List();
            var result = await request.ExecuteAsync();
        }
    }
}

The "compromise" I mentioned is that call to Task.WaitAll. That is a blocking call, and therefore there's still no guarantee that you won't run into deadlocks here. But this should be a big improvement if you don't have the time/resources to properly go async all the way up the call stack. You have far less thread blocking, and far fewer threads in general.

Todd Menier
  • 37,557
  • 17
  • 150
  • 173