In Real World Haskell, Chapter 28, Software transactional memory, a concurrent web link checker is developed. It fetches all the links in a webpage and hits every once of them with a HEAD request to figure out if the link is active. A concurrent approach is taken to build this program and the following statement is made:
We can't simply create one thread per URL, because that may overburden either our CPU or our network connection if (as we expect) most of the links are live and responsive. Instead, we use a fixed number of worker threads, which fetch URLs to download from a queue.
I do not fully understand why this pool of threads is needed instead of using forkIO
for each link. AFAIK, the Haskell runtime maintains a pool of threads and schedules them appropriately so I do not see the CPU being overloaded. Furthermore, in a discussion about concurrency on the Haskell mailing list, I found the following statement going in the same direction:
The one paradigm that makes no sense in Haskell is worker threads (since the RTS does that for us); instead of fetching a worker, just forkIO instead.
Is the pool of threads only required for the network part or there is a CPU reason for it too?