May I have Project Loom Clarified?

Question

Brian Goetz got me excited about project Loom and, in order to fully appreciate it, I'll need some clarification on the status quo.

My understanding is as follows: Currently, in order to have real parallelism, we need to have a thread per cpu/core; 1) is there then any point in having n+1 threads on an n-core machine? Project Loom will bring us virtually limitless threads/fibres, by relying on the jvm to carry out a task on a virtual thread, inside the JVM. 2) Will that be truly parallel? 3)How, specifically, will that differ from the aforementioned scenario "n+1 threads on an n-core machine "?

Thanks for your time.

n+1 usually assumes a thread would stall for IO and another thread can use the otherwise wasted resource because both CPU & IO tasks are merged. If you separate them, then m:n works, but you must be aware and actively separate those tasks — Martheen, Apr 18 '22 at 11:23
Project loom tries to bring concurrency, not parallelism. Parallelism solves one task, distributed on multiple threads. Concurrency means multiple tasks competing for the same resources. — Васил Егов, Apr 18 '22 at 19:10

score 4 · Accepted Answer · answered Apr 18 '22 at 17:19

Virtual threads allow for concurrency (IO bound), not parallelism (CPU bound). They represent causal simultaneity, but not resource usage simultaneity.

In fact, if two virtual threads are in an IO bound* state (awaiting a return from a REST call for example), then no thread is being used at all. Whereas, the use of normal threads (if not using a reactive or completable semantic) would both be blocked and unavailable for use until the calls are complete.

*Except for certain conditions (e.g., use of synchonize vs ReentrackLock, blocking that occurs in a native method, and possibly some other minor areas).

Soonts · Answer 2 · 2022-11-09T13:18:27.630

is there then any point in having n+1 threads on an n-core machine?

For one, most modern n-core machines have n*2 hardware threads because each core has 2 hardware threads.

Sometimes it does make sense to spawn more OS threads than hardware threads. That’s the case when some OS threads are asleep waiting for something. For instance, on Linux, until io_uring arrived couple years ago, there was no good way to implement asynchronous I/O for files on local disks. Traditionally, disk-heavy applications spawned more threads than CPU cores, and used blocking I/O.

Will that be truly parallel?

Depends on the implementation. Not just the language runtime, but also the I/O related parts of the standard library. For instance, on Windows, when doing disk or network I/O in C# with async/await (an equivalent of project loom, released around 2012) these tasks are truly parallel, the OS kernel and drivers are indeed doing more work at the same time. AFAIK on Linux async/await is only truly parallel for sockets but not files, for asynchronous file I/O it uses a pool of OS threads under the hood.

How, specifically, will that differ from the aforementioned scenario "n+1 threads on an n-core machine "?

OS threads are more expensive for a few reasons. (1) They require native stack so each OS thread consumes memory (2) Memory is slow, processors have caches to compensate, switching between OS threads increases RAM bandwidth because thread-specific data invalidates after a context switch (3) OS schedulers were improving over decades but still they’re not free. One reason is saving/restoring thread state to/from memory takes time.

The higher-level cooperative multitasking implemented in C# async/await or Java’s Loom causes way less overhead when switching contexts, compared to switching OS threads. At least in theory, this should improve both throughput and latency for I/O heavy applications.

May I have Project Loom Clarified?

2 Answers2