8

When working with tasks, a rule of thumb appears to be that the thread pool - typically used by e.g. invoking Task.Run(), or Parallel.Invoke() - should be used for relatively short operations. When working with long running operations, we are supposed to use the TaskCreationOptions.LongRunning flag in order to - as far as I understand it - avoid clogging the thread pool queue, i.e. to push work to a newly-created thread.

But what exactly is a long running operation? How long is long, in terms of time? Are there other factors besides the expected task duration to be considered when deciding whether or not to use the LongRunning, like the anticipated CPU architecture (frequency, the number of cores, ...) or the number of tasks that will be attempted to be run at once from the programmer's perspective?

For example, suppose I have 500 tasks to process in a dedicated application, each taking 10-20 seconds to complete. Should I just start all 500 tasks using Task.Run (e.g. in a loop) and then await them all, perhaps as LongRunning, while leaving the default max level of concurrency? Then again, if I set LongRunning in such case, wouldn't this create 500 new threads and actually cause a lot of overhead and higher memory usage (due to extra threads being allocated) as compared to omitting LongRunning? This is assuming that no new tasks will be scheduled for execution while these 500 are being awaited.

I would guess that the decision to set LongRunning depends on the number of requests made to the thread pool in a given time interval, and that LongRunning should only be used for tasks that are expected to take significantly longer that the majority of the thread pool-placed tasks - by definition, at most a small percentage of all tasks. In other words, this appears to be a queuing and thread pool utilization optimization problem that should likely be solved case-by-case through testing, if at all. Am I correct?

w128
  • 4,680
  • 7
  • 42
  • 65

4 Answers4

12

It kind of doesn't matter. The problem isn't really about time, it's about what your code is doing. If you're doing asynchronous I/O, you're only using the thread for the short amount of time between individual requests. If you're doing CPU work... well, you're using the CPU. There's no "thread-pool starvation", because the CPUs are fully utilized.

The real problem is when you're doing blocking work that doesn't use the CPU. In case like that, thread-pool starvation leads to CPU-underutilization - you said "I need the CPU for my work" and then you don't actually use it.

If you're not using blocking APIs, there's no point in using Task.Run with LongRunning. If you have to run some legacy blocking code asynchronously, using LongRunning may be a good idea. Total work time isn't as important as "how often you are doing this". If you spin up one thread based on a user clicking on a GUI, the cost is tiny compared to all the latencies already included in the act of clicking a button in the first place, and you can use LongRunning just fine to avoid the thread-pool. If you're running a loop that spawns lots of blocking tasks... stop doing that. It's a bad idea :D

For example, imagine there is no asynchronous API alternative File.Exists. So if you see that this is giving you trouble (e.g. over a faulty network connection), you'd fire it up using Task.Run - and since you're not doing CPU work, you'd use LongRunning.

In contrast, if you need to do some image manipulation that's basically 100% CPU work, it doesn't matter how long the operation takes - it's not a LongRunning thing.

And finally, the most common scenario for using LongRunning is when your "work" is actually the old-school "loop and periodically check if something should be done, do it and then loop again". Long running, but 99% of the time just blocking on some wait handle or something like that. Again, this is only useful when dealing with code that isn't CPU-bound, but that doesn't have proper asynchronous APIs. You might find something like this if you ever need to write your own SynchronizationContext, for example.

Now, how do we apply this to your example? Well, we can't, not without more information. If your code is CPU-bound, Parallel.For and friends are what you want - those ensure you only use enough threads to sature the CPUs, and it's fine to use the thread-pool for that. If it's not CPU bound... you don't really have any option besides using LongRunning if you want to run the tasks in parallel. Ideally, such work would consist of asynchronous calls you can safely invoke and await Task.WhenAll(...) from your own thread.

Rob Mensching
  • 33,834
  • 5
  • 90
  • 130
Luaan
  • 62,244
  • 7
  • 97
  • 116
  • Thank you for a detailed answer. Yes, I was assuming CPU-bound tasks. However, in case of IO bound tasks, why would `LongRunning` have anything to do with _parallelism_? Wouldn't spawning 500 IO-bound tasks with `LongRunning = false` just put them on the thread pool (assuming enough space), while `LongRunning = true` would create 500 new threads - with the perceived concurrency and responsiveness being near the same to the user, or indeed worse in the latter case due to extra thread creation overhead? – w128 Apr 14 '16 at 15:02
  • 1
    @w128 It's about how the thread-pool allocates new threads. By default, it's balanced according to the number of your CPU cores (usually about twice the amount of physical cores), which is great for CPU work. When you need *more* threads, the thread-pool allocates them as needed, but with delay - IIRC it takes about 2s for a new thread to be added to the pool. So if you add 500 blocking tasks to the thread-pool at once, it will take at least 1000s for the pool to have enough threads to handle them all concurrently. `LongRunning` is limited only by the spin up itself, which is much faster. – Luaan Apr 14 '16 at 17:18
5

When working with tasks, a rule of thumb appears to be that the thread pool - typically used by e.g. invoking Task.Run(), or Parallel.Invoke() - should be used for relatively short operations. When working with long running operations, we are supposed to set the TaskCreationOptions.LongRunning to true in order to - as far as I understand it - avoid clogging the thread pool queue, i.e. to push work to a newly-created thread.

The vast majority of the time, you don't need to use LongRunning at all, because the thread pool will adjust to "losing" a thread to a long-running operation after 2 seconds.

The main problem with LongRunning is that it forces you to use the very dangerous StartNew API.

In other words, this appears to be a queuing and thread pool utilization optimization problem that should likely be solved case-by-case through testing, if at all. Am I correct?

Yes. You should never set LongRunning when first writing code. If you are seeing delays due to the thread pool injection rate, then you can carefully add LongRunning.

Stephen Cleary
  • 437,863
  • 77
  • 675
  • 810
2

You should not use TaskCreationOptions.LongRunning in your case. I would use Parallel.For.

The LongRunning option is not to be used if you're going to create a lot of tasks, just like in your case. It is to be used for creating couple of tasks that will be running for a Long Time.

By the way, i never used this option in any similar scenario.

Zein Makki
  • 29,485
  • 6
  • 52
  • 63
2

As you point out, TaskCreationOptions.LongRunning's purpose is

to allow the ThreadPool to continue to process work items even though one task is running for an extended period of time

As for when to use it:

It's not a specific length per se...You'd typically only use LongRunning if you found through performance testing that not using it was causing long delays in the processing of other work.

Source

AGB
  • 2,230
  • 1
  • 14
  • 21
  • +1 For being the only answer to include a source link, which itself contains a response from Stephen Toub at Microsoft (your quote). Furthermore, this appears to be genuinely the only advice that one need consider when deciding whether or not to use the `LongRunning` flag with many tasks that need to be run asynchronously, and when `Parallel.For` doesn't apply; e.g., CPU intensive, long-running background work during each web request. – Dave Sexton Aug 21 '18 at 18:16