0

I am tasked with updating a c# application (non-gui) that is very single-threaded in it's operation and add multi-threading to it to get it to turn queues of work over quicker.

Each thread will need to perform a very minimal amount of calculations, but most of the work will be calling on and wait on SQL Server requests. So, lots of waiting as compared to CPU time.

A couple of requirements will be:

  • Running on some limited hardware (that is, just a couple of cores). The current system, when it's being "pushed" only takes about 25% CPU. But, since it's mostly doing waits for the SQL Server to respond (different server), we would like to the capability to have more threads than cores.
  • Be able to limit the number of threads. I also can't just have an unlimited number of threads going either. I don't mind doing the limiting myself via an Array, List, etc.
  • Be able to keep track of when these threads complete so that I can do some post-processing.

It just seems to me that the .NET Framework has so many different ways of doing threads, I'm not sure if one is better than the other for this task. I'm not sure if I should be using Task, Thread, ThreadPool, something else... It appers to me that async \ await model would not be a good fit in this case though as it waits on one specific task to complete.

Jim
  • 2,034
  • 1
  • 22
  • 43
  • 1
    Can you please post the current code? – Enigmativity Aug 19 '20 at 22:15
  • @Enigmativity The current code is quite a large project, I wouldn't even know where to start to carve it up for even some snips. That's why I was asking a high-level question to try and get back a high-level answer. Sorry. – Jim Aug 19 '20 at 22:29
  • Then you should try to give some examples signatures of the calls that you're making at least. The tool to do the job need to match. It's a bit like you've asked us how to cut some wood - there are a million tools for cutting wood - but do you need a jigsaw or an axe? – Enigmativity Aug 20 '20 at 00:27
  • How are the queues of work managed in your current app? How does work get triggered? What's happening with the results? – Enigmativity Aug 20 '20 at 00:37

4 Answers4

2

I'm not sure if I should be using Task, Thread, ThreadPool, something else...

In your case it matters less than you would think. You can focus on what fits your (existing) code style and dataflow the best.

since it's mostly doing waits for the SQL Server to respond

Your main goal would be to get as many of those SQL queries going in parallel as possible.

Be able to limit the number of threads.

Don't worry about that too much. On 4 cores, with 25% CPU, you can easily have 100 threads going. More on 64bit. But you don't want 1000s of threads. A .net Thread uses 1MB minimum, estimate how much RAM you can spare.

So it depends on your application, how many queries can you get running at the same time. Worry about thread-safety first.

When the number of parallel queries is > 1000, you will need async/await to run on fewer threads.

As long as it is < 100, just let threads block on I/O. Parallel.ForEach() , Parallel.Invoke() etc look like good tools.

The 100 - 1000 range is the grey area.

H H
  • 263,252
  • 30
  • 330
  • 514
2

add multi-threading to it to get it to turn queues of work over quicker.

Each thread will need to perform a very minimal amount of calculations, but most of the work will be calling on and wait on SQL Server requests. So, lots of waiting as compared to CPU time.

With that kind of processing, it's not clear how multithreading will benefit you. Multithreading is one form of concurrency, and since your workload is primarily I/O-bound, asynchrony (and not multithreading) would be the first thing to consider.

It just seems to me that the .NET Framework has so many different ways of doing threads, I'm not sure if one is better than the other for this task.

Indeed. For reference, Thread and ThreadPool are pretty much legacy these days; there are much better higher-level APIs. Task should also be rare if used as a delegate task (e.g., Task.Factory.StartNew).

It appers to me that async \ await model would not be a good fit in this case though as it waits on one specific task to complete.

await will wait on one task at a time, yes. Task.WhenAll can be used to combine multiple tasks and then you can await on the combined task.

get it to turn queues of work over quicker.

Be able to limit the number of threads.

Be able to keep track of when these threads complete so that I can do some post-processing.

It sounds to me that TPL Dataflow would be the best approach for your system. Dataflow allows you to define a "pipeline" through which data flows, with some steps being asynchronous (e.g., querying SQL Server) and other steps being parallel (e.g., data processing).

I was asking a high-level question to try and get back a high-level answer.

You may be interested in my book.

ndogac
  • 1,185
  • 6
  • 15
Stephen Cleary
  • 437,863
  • 77
  • 675
  • 810
  • "With that kind of processing". Users will upload a number of items (data files) to be processed and right now the process is "single threaded". One file gets processed, then the next, then the next... If we have 15 users that uploaded around the same time, and each file takes about 1 minute to process, then whoever was last to upload will wait around 15 minutes. Since a lot of that 1 minute is waiting for SQL in various calls, we would like to process multiple files at the same time, so that the last uploader gets done sooner. – Jim Aug 19 '20 at 23:04
0

It appers to me that async \ await model would not be a good fit in this case though as it waits on one specific task to complete.

That is wrong. Async/await is just a syntax to simplify a state-machine mechanism for asynchronous code. It waits without consuming any thread. in other words async keyword does not create thread and await does not hold up any thread.

Be able to limit the number of threads

see How to limit the amount of concurrent async I/O operations?

Be able to keep track of when these threads complete so that I can do some post-processing.

If you don't use "fire and forget" pattern then you can keep track of the task and its exceptions just by writing await task

var task = MethodAsync();
await task;
PostProcessing();

async Task MethodAsync(){ ... }

Or for a similar approach you can use ContinueWith:

var task = MethodAsync();
await task.ContinueWith(() => PostProcessing());

async Task MethodAsync(){ ... }

read more:

Releasing threads during async tasks

https://learn.microsoft.com/en-us/dotnet/standard/asynchronous-programming-patterns/?redirectedfrom=MSDN

Bizhan
  • 16,157
  • 9
  • 63
  • 101
0

The TPL Dataflow library is probably one of the best options for this job. Here is how you could construct a simple dataflow pipeline consisting of two blocks. The first block accepts a filepath and produces some intermediate data, that can be later inserted to the database. The second block consumes the data coming from the first block, by sending them to the database.

var inputBlock = new TransformBlock<string, IntermediateData>(filePath =>
{
    return GetIntermediateDataFromFilePath(filePath);
}, new ExecutionDataflowBlockOptions()
{
    MaxDegreeOfParallelism = Environment.ProcessorCount // What the local machine can handle
});

var databaseBlock = new ActionBlock<IntermediateData>(item =>
{
    SaveItemToDatabase(item);
}, new ExecutionDataflowBlockOptions()
{
    MaxDegreeOfParallelism = 20 // What the database server can handle
});

inputBlock.LinkTo(databaseBlock);

Now every time a user uploads a file, you just save the file in a temp path, and post the path to the first block:

inputBlock.Post(filePath);

And that's it. The data will flow from the first to the last block of the pipeline automatically, transformed and processed along the way, according to the configuration of each block.

This is an intentionally simplified example to demonstrate the basic functionality. A production-ready implementation will probably have more options defined, like the CancellationToken and BoundedCapacity, will watch the return value of inputBlock.Post to react in case the block can't accept the job, may have completion propagation, watch the databaseBlock.Completion property for errors etc.

If you are interested at following this route, it would be a good idea to study the library a bit, in order to become familiar with the options available. For example there is a TransformManyBlock available, suitable for producing multiple outputs from a single input. The BatchBlock may also be useful in some cases.

The TPL Dataflow is built-in the .NET Core, and available as a package for .NET Framework. It has some learning curve, and some gotchas to be aware of, but it's nothing terrible.

Theodor Zoulias
  • 34,835
  • 7
  • 69
  • 104