Are thread pools needed for pure Haskell code?

Question

In Real World Haskell, Chapter 28, Software transactional memory, a concurrent web link checker is developed. It fetches all the links in a webpage and hits every once of them with a HEAD request to figure out if the link is active. A concurrent approach is taken to build this program and the following statement is made:

We can't simply create one thread per URL, because that may overburden either our CPU or our network connection if (as we expect) most of the links are live and responsive. Instead, we use a fixed number of worker threads, which fetch URLs to download from a queue.

I do not fully understand why this pool of threads is needed instead of using forkIO for each link. AFAIK, the Haskell runtime maintains a pool of threads and schedules them appropriately so I do not see the CPU being overloaded. Furthermore, in a discussion about concurrency on the Haskell mailing list, I found the following statement going in the same direction:

The one paradigm that makes no sense in Haskell is worker threads (since the RTS does that for us); instead of fetching a worker, just forkIO instead.

Is the pool of threads only required for the network part or there is a CPU reason for it too?

The pool is required to control level of concurrency and manage it. You're probably forgetting the practical considerations.. Haskell runtime is indeed quite good at maintaining Haskell-space threads - they are quite lightweight and you can spawn thousands of them no problem. But what happens when you take a list of 100K urls and just forkIO one after another with no "pooling"? You'll likely make thousands upon thousands of connections. Many will timeout, your system will run out of file descriptors and you'll likely run out of RAM in trying to process the results. — ozataman, Mar 04 '13 at 04:48

score 23 · Accepted Answer · answered Mar 04 '13 at 04:47

The core issue, I imagine, is the network side. If you have 10,000 links and forkIO for each link, then you potentially have 10,000 sockets you're attempting to open at once, which, depending on how your OS is configured, probably won't even be possible, much less efficient.

However, the fact that we have green threads that get "virtually" scheduled across multiple os threads (which ideally are stuck to individual cores) doesn't mean that we can just distribute work randomly without regards to cpu usage either. The issue here isn't so much that the scheduling of the CPU itself won't be handled for us, but rather that context-switches (even green ones) cost cycles. Each thread, if its working on different data, will need to pull that data into the cpu. If there's enough data, that means pulling things in and out of the cpu cache. Even absent that, it means pulling things from the cache to registers, etc.

Even if a problem is trivially parallel, it is virtually never the right idea to just break it up as small as possible and attempt to do it "all at once".

Haha! Looks like we commented/answered at the exact same time (to within 15 seconds)! — ozataman, Mar 04 '13 at 04:49
Pulling data from cache to registers is needed as well when queueing it within single thread. I feel threads housekeeping overhead is more important in the case. — leventov, Mar 04 '13 at 11:08

Are thread pools needed for pure Haskell code?

1 Answers1

Linked