Please read the questions before marking as duplicate. This question implements the solution provided in this question and still experiences a deadlock.
I am debugging a large multi-threaded application that makes many concurrent calls to various Google APIs using the .Net client libraries and we are occasionally experiencing deadlocks during some requests. Some information about the application:
- Application is a Windows service (SynchronizationContext is null)
- .Net Framework version is 4.5
- All threads making API requests are not in the default thread pool.
Specifically, we are calling the Execute() method which uses an async method and blocks while waiting for the result
public TResponse Execute()
{
try
{
using (var response = ExecuteUnparsedAsync(CancellationToken.None).Result)
{
return ParseResponse(response).Result;
}
}
...
}
This in turn calls ExecuteUnparsedAsync() which executes HttpClient.SendAsync()
private async Task<HttpResponseMessage> ExecuteUnparsedAsync(CancellationToken cancellationToken)
{
using (var request = CreateRequest())
{
return await service.HttpClient.SendAsync(request, cancellationToken).ConfigureAwait(false);
}
}
Now, I understand that there are many ways we can experience deadlocks here, and that it would be best to change the application to use async methods instead to avoid them. Unfortunately, that would be a significant time investment that is not possible at this moment but may be in the future.
My specific issue is that all calling threads are not in the thread pool and because ConfigureAwait(false)
is being called I expect that the continuations would always be run in the thread pool, but that is not the case. Instead it appears that the continuations are being scheduled on the original calling thread and the thread deadlocks because it is waiting on the result.
Using the following MCVE I am able to produce a deadlock within a few hours.
using Google.Apis.Auth.OAuth2;
using Google.Apis.Drive.v2;
using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.IO;
using System.Linq;
using System.Net;
using System.Text;
using System.Threading;
using System.Threading.Tasks;
namespace DeadlockTest
{
class Program
{
const int NUM_THREADS = 70;
static long[] s_lastExecute = new long[NUM_THREADS];
static long count = 0;
static void Main(string[] args)
{
ServicePointManager.DefaultConnectionLimit = 50;
for(int i = 0; i < s_lastExecute.Length; i++)
{
s_lastExecute[i] = DateTime.Now.ToBinary();
}
Thread deadlockCheck = new Thread(new ThreadStart(CheckForDeadlock));
deadlockCheck.Start();
RunThreads();
deadlockCheck.Join();
}
static void RunThreads()
{
List<Thread> threads = new List<Thread>();
for (int i = 0; i < NUM_THREADS; i++)
{
int threadIndex = i;
Thread thread = new Thread(
new ParameterizedThreadStart(BeginThread));
thread.Start(threadIndex);
threads.Add(thread);
}
foreach(var thread in threads)
{
thread.Join();
}
}
static void BeginThread(object threadIndex)
{
Debug.Assert(SynchronizationContext.Current == null);
Debug.Assert(Thread.CurrentThread.IsThreadPoolThread == false);
ThreadLoop((int)threadIndex);
}
static void ThreadLoop(int threadIndex)
{
Random random = new Random(threadIndex);
while (true)
{
try
{
GoogleDrive.Test(random);
}
catch(Exception ex)
{
Console.WriteLine(ex.Message);
}
Interlocked.Exchange(ref s_lastExecute[threadIndex], DateTime.Now.ToBinary());
Interlocked.Increment(ref count);
}
}
private static void CheckForDeadlock()
{
Console.WriteLine("Deadlock check started");
TimeSpan period = TimeSpan.FromMinutes(1);
TimeSpan deadlockThreshold = TimeSpan.FromMinutes(10);
while (true)
{
Thread.Sleep((int)period.TotalMilliseconds);
DateTime now = DateTime.Now;
TimeSpan oldestUpdate = TimeSpan.MinValue;
for (int i = 0; i < NUM_THREADS; i++)
{
DateTime lastExecute = DateTime.FromBinary(
Interlocked.Read(ref s_lastExecute[i]));
TimeSpan delta = now - lastExecute;
if(delta > oldestUpdate)
{
oldestUpdate = delta;
}
if (delta > deadlockThreshold)
{
var msg = string.Format("Deadlock detected in thread {0} for {1} minutes",
i.ToString(), (now - lastExecute).TotalMinutes);
Console.WriteLine(msg);
}
}
int workerThreads, completionPortThreads;
System.Threading.ThreadPool.GetAvailableThreads(out workerThreads, out completionPortThreads);
Console.WriteLine("Checked for deadlocks.");
Console.WriteLine("\tWorker threads: " + workerThreads.ToString());
Console.WriteLine("\tCompletion port threads: " + completionPortThreads.ToString());
Console.WriteLine("\tExecute calls: " + Interlocked.Read(ref count).ToString());
Console.WriteLine("\tOldest update (minutes): " + oldestUpdate.TotalMinutes.ToString());
}
}
}
class GoogleDrive
{
const string SERVICE_ACCOUNT = @"<path_to_service_account>";
static string[] SCOPES = { DriveService.Scope.Drive };
public static DriveService GetDriveService(string user)
{
GoogleCredential credential;
using (var stream = new FileStream(SERVICE_ACCOUNT, FileMode.Open, FileAccess.Read))
{
credential = GoogleCredential
.FromStream(stream)
.CreateScoped(SCOPES)
.CreateWithUser(user);
}
var service = new DriveService(new DriveService.Initializer()
{
HttpClientInitializer = credential
});
return service;
}
public static void Test(Random random)
{
int userIndex = random.Next(Users.USERS.Length);
string user = Users.USERS[userIndex];
using (DriveService service = GetDriveService(user))
{
var request = service.Files.List();
var result = request.Execute();
}
}
}
public static class Users
{
public static string[] USERS = new string[]
{
"user0000@domain.com",
"user0001@domain.com",
...
};
}
}
Running this test overnight gave me the following:
Deadlock detected in thread 15 for 274.216744496667 minutes
Deadlock detected in thread 45 for 154.73506413 minutes
Deadlock detected in thread 46 for 844.978023301667 minutes
Checked for deadlocks.
Worker threads: 2045
Completion port threads: 989
Execute calls: 2153228
Oldest update (minutes): 844.978023301667
Once the deadlock is detected I can insert a break in the thread loop and let the running threads exit. This leaves me with the main thread, the timer thread, two threads used by the runtime and my deadlocked threads (three in this case. Also note that thread IDs don't match up because I wasn't smart enough to use the actual thread IDs):
Each thread has the following call stack:
mscorlib.dll!System.Threading.Monitor.Wait(object obj, int millisecondsTimeout, bool exitContext)
mscorlib.dll!System.Threading.Monitor.Wait(object obj, int millisecondsTimeout)
mscorlib.dll!System.Threading.ManualResetEventSlim.Wait(int millisecondsTimeout, System.Threading.CancellationToken cancellationToken)
mscorlib.dll!System.Threading.Tasks.Task.SpinThenBlockingWait(int millisecondsTimeout, System.Threading.CancellationToken cancellationToken)
mscorlib.dll!System.Threading.Tasks.Task.InternalWait(int millisecondsTimeout, System.Threading.CancellationToken cancellationToken)
mscorlib.dll!System.Threading.Tasks.Task<System.Net.Http.HttpResponseMessage>.GetResultCore(bool waitCompletionNotification)
mscorlib.dll!System.Threading.Tasks.Task<System.__Canon>.Result.get()
Google.Apis.dll!Google.Apis.Requests.ClientServiceRequest<Google.Apis.Drive.v2.Data.FileList>.Execute()
> DeadlockTest.exe!DeadlockTest.GoogleDrive.Test(System.Random random)
DeadlockTest.exe!DeadlockTest.Program.ThreadLoop(int threadIndex)
DeadlockTest.exe!DeadlockTest.Program.BeginThread(object threadIndex)
mscorlib.dll!System.Threading.ThreadHelper.ThreadStart_Context(object state)
mscorlib.dll!System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext executionContext, System.Threading.ContextCallback callback, object state, bool preserveSyncCtx)
mscorlib.dll!System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext executionContext, System.Threading.ContextCallback callback, object state, bool preserveSyncCtx)
mscorlib.dll!System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext executionContext, System.Threading.ContextCallback callback, object state)
mscorlib.dll!System.Threading.ThreadHelper.ThreadStart(object obj)
This also leaves me with the following remaining tasks:
I do not know of a way to find out exactly which thread a task is scheduled on, but I think it's obvious that they are scheduled for the deadlocked threads (i.e. they are the source of the deadlock).
This bring me to my questions:
- Why are the continuation tasks not being scheduled in the thread pool instead of the calling thread?
- Is there a way to force the tasks to run on the thread pool, but not have a thread pool thread block on the result?