2

Given the example from Control.Concurrent.Async:

do a1 <- async (getURL url1)
  a2 <- async (getURL url2)
  page1 <- wait a1
  page2 <- wait a2

Do the two getURL calls run on different OS threads, or just different green threads?

In case my question doesn't make sense... say the program is running on one OS thread only, will these calls still be made at the same time? Do blocking IO operations block the whole OS thread and all the green threads on that OS thread, or just one green thread?

zoran119
  • 10,657
  • 12
  • 46
  • 88
  • Does `getURL` make any FFI-based calls? – Daniel Wagner May 09 '19 at 14:17
  • @jberryman, I'm not sure you're right about that. The non-threaded IO subsystem has a bit of edge-case wonkiness where it's not quite right and probably can't be. I don't remember where that is. – dfeuer May 09 '19 at 17:35
  • 2
    @jberryman GHC does *not* use pre-emptive multitasking. With some care, you can write pure Haskell code (not even `IO` is required!) which blocks all other threads on the same capability. It is cooperative, with yields happening at memory allocation by default (this is almost always often enough) or at every function call if you turn on [`-fno-omit-yields`](https://downloads.haskell.org/~ghc/latest/docs/html/users_guide/using-optimisation.html#ghc-flag--fomit-yields). This is true of both the threaded and non-threaded runtime. [See also.](https://stackoverflow.com/q/55336948/791604) – Daniel Wagner May 09 '19 at 17:45
  • (erroneously deleted my comment, sorry!). @DanielWagner appreciate that point, but it seems to me it's debatable what to actually call GHC's model. In practice it feels and users expect it to behave as preemptive (albeit with a slightly sketchy implementation). That the compiler inserts yields at particular points to obtain (approximate) time-slicing is an implementation detail don't you think? – jberryman May 09 '19 at 19:16
  • 1
    @jberryman Thinking of it abstractly as pre-emptive is a nice abstraction right up to the moment that the abstraction leaks. Then all of a sudden your tight, non-allocating loop is a bug instead of a feature, and your unsafe FFI call makes your program slower instead of faster. – Daniel Wagner May 09 '19 at 21:02

3 Answers3

8

From the documentation of Control.Concurrent.Async

This module provides a set of operations for running IO operations asynchronously and waiting for their results. It is a thin layer over the basic concurrency operations provided by Control.Concurrent.

and Control.Concurrent

Scheduling of Haskell threads is done internally in the Haskell runtime system, and doesn't make use of any operating system-supplied thread packages.

This last may be a bit misleading if not interpreted carefully: although the scheduling of Haskell threads -- that is, the choice of which Haskell code to run next -- is done without using any OS facilities, GHC can and does use multiple OS threads to actually execute whatever code is chosen to be run, at least when using the threaded runtime system.

Daniel Wagner
  • 145,880
  • 9
  • 220
  • 380
Micha Wiedenmann
  • 19,979
  • 21
  • 92
  • 137
3

It should all be green threads.

If your program is compiled (or rather, linked) with the single-threaded RTS, then all green threads run in a single OS thread. If your program is compiled (linked) with the multi-threaded RTS, then some arbitrary number of green threads are scheduled across (by default) one OS thread per CPU core.

As far as I'm aware, in either case blocking I/O calls should only block one green thread. Other green threads should be completely unaffected.

MathematicalOrchid
  • 61,854
  • 19
  • 123
  • 220
  • Even with the threaded runtime, the default is one OS thread, not multiple. You have to explicitly ask for more threads, either via RTS options or programmatically. – Daniel Wagner May 09 '19 at 14:12
  • @DanielWagner I thought they changed the default to be one per CPU core (because people were confused by the default being only one thread). – MathematicalOrchid May 09 '19 at 14:28
  • Oh, hm. It doesn't seem to be that way in 8.4.3. I don't have an 8.6.* to check on at the moment, though none of the 8.6.* release notes mention the word "thread" anywhere. – Daniel Wagner May 09 '19 at 14:29
  • Also, there's the FFI thread pool to consider, which isn't counted by the `-N` argument. – Carl May 09 '19 at 14:31
1

This isn't as simple as the question seems to imply. Haskell is a more capable programming language than most you would have run into. In particular, IO operations that appear to block from an internal point of view may be implemented as the sequence "start non-blocking IO operation, suspend thread, wait for that IO operation to complete in an IO manager that covers multiple Haskell threads, queue thread for resumption once the IO device is ready."

See waitRead# and waitWrite# for the api that provides that functionality with the standard global IO manager.

Using green threads or not is mostly irrelevant with this pattern. IO operations can be written to use non-blocking IO behind the scenes, with proper multiplexing, while appearing to present a blocking interface to their users.

Unfortunately, it's not that simple either. The fact is that OS limitations get in the way. Until very recently (I think the 5.1 kernel was released yesterday, maybe?), Linux has provided no good interface for non-blocking disk operations. Sure there were things that looked like they should work, but in practice they weren't very good. So disk reads/writes are actual blocking operations in GHC. (Not just on linux, either. GHC doesn't have a lot of developers supporting it, so a lot of things are written with the same code that works on linux, even if there are other alternatives.)

But it's not even as simple as "network operations are hidden non-blocking, disk operations are blocking". At least maybe not. I don't actually know, because it's so hard to find documentation on the non-threaded runtime. I know the threaded runtime actually maintains a separate thread pool for performing FFI calls marked as "safe", which prevents them from blocking execution of green threads. I don't know if the same is true with the non-threaded runtime.

But for your example, I can say - assuming getURL uses the standard network library (it's a hypothetical function anyway), it'll be doing non-blocking IO with proper multiplexing behind the scenes. So those operations will be truly concurrent, even without the threaded runtime.

Carl
  • 26,500
  • 4
  • 65
  • 86