Why aren't Java threads both lightweight (like green threads) and multi-core capable? (backed by an internal native fixed size native thread pool)

Question

Back in java 1.1 all threads were running on a single core (not taking advantage of machine's multiple cores/CPUs) and scheduled by JVM in user space (so called green threads).

Around Java 1.2 / 1.3 (depending on the underlying OS), a change was made and Java Thread objects were mapped to OS threads (pthreads in case of Linux), which takes full advantage of multiple cores, but OTOH creating a thread became very expensive in terms of memory (because of crazy huge initial stack size of OS threads), which heavily limits the number of concurrent requests that a single machine can handle in thread-per-request model. This required server-side architectures to switch to the asynchronous model (non-blocking I/O package was introduced, AsyncContext was added to servlet API, etc) which has been continuously confusing several generations of Java server-side devs up to this day: at first most APIs look like they were intended for thread-per-request model and one needs to carefully read API documentations to find async capabilities bootstrapped to them from a side.

Only recently project Loom finally aims to deliver lightweight threads that are backed by a thread pool (a Java thread pool of "old-style" Java threads, which in turn map to OS threads) and thus combining the advantages: cheap to create in large quantities threads that do utilize multiple cores and can be lightheartedly suspended on blocking operations (such as I/O etc).

Why is this happening only now, after 20 years, instead of right away in Java 1.3? ie: why Java threads were made to map 1-1 to OS threads instead of being backed (executed) by JVM's internal thread pool of OS threads of fixed size corresponding to available CPU cores?

Is it difficult to implement in JVM maybe?
It seems not much more complex that all the asynchronous programming that java server-side devs have been forced to do for the last 20 years and what C/C++ devs have always been doing, but maybe I'm missing something.

Another possibility is that there is some blocking obstacle in architectural design of JVM that prevents it from being implemented this way.

UPDATE:
Project Loom's architecture design info was updated according to comments: many thanks!

Why, *exactly,* are you now asking this question – twenty years later? I suppose that you are now encountering a technical issue that needs to be solved? So ... what exactly is it? — Mike Robinson, Apr 30 '21 at 17:06
@MikeRobinson well, this question always bothered me subconsciously, I think, and when I found about project loom it articulated itself right in front of me: "obvioulsy! ...but why only now?" ;-) — morgwai, Apr 30 '21 at 17:09
Maybe – just guessing – it's because the technology has evolved. Or maybe, because the underlying *hardware* has advanced to the point *(yay!)* where it actually matters. ‍♂️ So, do you think that you could now articulate this requirement into a technical specification that could now be addressed by the JVM team? — Mike Robinson, Apr 30 '21 at 18:07
@MikeRobinson my understanding is that it's pretty well articulated and worked on by the mentioned [project Loom](https://openjdk.java.net/projects/loom/): [their github](https://github.com/openjdk/loom) seems pretty active — morgwai, Apr 30 '21 at 18:49
I would be very grateful to anyone who marked this question for closing due to it being "opinion based" to explain in the comment why they thinks so. the question asks about specific technical reasons for which something was not implemented in JVM: how is this opinion based? I would like to kindly ask all people who do not understand the question, to read provided links, before voting to close it. — morgwai, Apr 30 '21 at 19:05
1) "green threads" existed, it was a terrible implementation. 2) why exactly do you think project loom takes so many years to implement? it is a very challenging task. It's not just one of those "let's just do it", by far. It is going to be the biggest change to the runtime and language ever, imo. 3) just remember `HashSet` - _still_ uses `HashMap` under the hood and people don't care - things still work great. — Eugene, Apr 30 '21 at 19:37
@Eugene my question is **why** it is not "let's just do it", because it seems so to me for now (see my comparison to every day async programming). Obviously I must be missing something and hence my question ;-) — morgwai, Apr 30 '21 at 19:41
there are many challenges for loom, like thread-locals, what and how is that supposed to work? A thread local per non-carrier thread? If so, when you yield, where is that going to be stored and how it is reverted? Considering the huge amount of this "light" threads, how and where is this memory going to be consumed? Or "garbage collection", how do you implement a _good_ collection of these thread resources? you walk their stacks as GC roots? All of them? You see the obvious side as being not that complicated, under the hood the work is enormous. — Eugene, Apr 30 '21 at 19:46
@Eugene ThreadLocal will work the way it is now out of the box: it's a specialized map in a Thread object. Yielding is trivial also: a pthread executing a given java thread just puts it back at the end of its work queue. to the GC one I don't have an immediate answer for though ;-) Thanks! — morgwai, Apr 30 '21 at 19:59
" it's a specialized map in a Thread object" - for a _plain_ carrier thread it is; *just* puts it back at the end of its work queue? Sorry, you seem to oversimplify this to a point where it makes no sense. — Eugene, Apr 30 '21 at 20:04
Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/231806/discussion-between-morgwai-and-eugene). — morgwai, Apr 30 '21 at 20:06
First of all, Java 1.3 had no thread pool API, so the idea to build lightweight threads atop that thread pool API could not occur. Further, there’s no sense in the claim “*what C/C++ devs have always been doing: pool of pthreads and work queue of Java Thread objects ready to be executed*”—How can a C/C++ developer maintain a “work queue of Java Thread objects”? Further, your timescale is far off. twenty years ago not even “*the asynchronous programming that java server-side devs have been forced to do for the last 20 years*” did exist. Neither did the native pthread support in common systems. — Holger, May 03 '21 at 10:27
@Holger regarding first: i didn't mean user java code threadpool, but c/c++ threadpool used in **jvm implementation**. Regarding further: you mixed similarly again. My timescale is pretty accurate: java threads were mapped to OS threads in version 1.2 on solaris and in 1.3 on linux which was in 2000. indeed until servlet 3.0 (~2009) asynchronous programming in servlet based web-apps was impossible/pointless, but I'm not talking only about web-apps. — morgwai, May 03 '21 at 15:12
@Holger to further clarify, by "work queue of java thread objects" I meant a work queue of tuples which is more-less what a java thread is from jvm's point of view. currently each such object is handled by a separate OS level pthread and I'm asking why instead not handle them all by a fixed-size pool of jvm internal OS-level pthreads. — morgwai, May 03 '21 at 15:55
There is no “c/c++ threadpool used in jvm implementation”. NPTL was finished in 2002, appearing in a Linux distribution around 2003 which [was a big deal back then](https://web.archive.org/web/20040405145146/http://java.sun.com/developer/technicalArticles/JavaTechandLinux/RedHat/). But a few more or less years don’t matter. The fundamental problem is your claim that C/C++ developers were routinely using what you suggests Java should have used, when in fact the entire technology was in its infancy. No one used such thread pools twenty years ago. I even have doubts about today’s C/C++ software… — Holger, May 04 '21 at 07:41
@Holger, yes I'm aware that jvm does not use a threadpool currently: that's what project Loom is aiming to do. My question is **why** it took so long to even start working on this. Regarding timeline again: I was introduced to ideas of threadPools and asynchronous server programming during my studies at the university in about 2001, so it must have been a pretty well known idea already, contrary to what you suggest that everything was in infancy. NPTL is not a required for this I think: the old Xavier Leroy's Linux pthread implementation should sufficient, although waaay less efficient. — morgwai, May 04 '21 at 08:16
Being introduced to concepts at a university is by far a different thing than the claimed routinely use by contemporary C/C++ developers. In practice, being far less efficient than the alternatives is a reason not to use it. I remember C/C++ developers alleging that these libraries were merely invented for Java. Besides that, you do not even understand project Loom. It does *not* use a “c/c++ threadpool” (can you name any real life C/C++ software having virtual threads atop a pool of native threads?). It runs virtual threads atop the Java thread pool/executor API that was introduced in Java 5. — Holger, May 04 '21 at 08:37
@Holger yes, indeed I don't know the details of implementation of Loom, but since current java threads map to OS threads, the final effect will be roughly the same. I'm not sure what you mean by "being far less efficient than the alternatives" and by "these libraries": let's continue on chat if you are interested to discuss this :) — morgwai, May 04 '21 at 09:21
@Holger more regarding thread-pool + asynchronous programming in C/C++: Google's C++ HTTP-related code base was using this model already for a few years at least when I first saw it in 2007 (not sure how long exactly though: I wasn't interested in history that much) — morgwai, May 04 '21 at 09:39
@Holger I've updated the question to contains more accurate info about Loom's strategy based on your comments: many thanks! :) — morgwai, May 04 '21 at 10:07
Wasn’t the topic “asynchronous programming [that java server-side devs have been forced to do for the last 20 years]” **vs** “lightweight threads ala Loom”? So your example is precisely **not** about lightweight threads. You keep switching back and forth between these terms. “asynchronous programming” in C/C++ was nothing new twenty years ago, such APIs existed. Native pthread implementations (and the required thread safe os functions) were in its infancy. In contrast, Java had thread support, but no “asynchronous programming” API back then. So the approaches used in practice were different. — Holger, May 04 '21 at 10:12

morgwai · Answer 1 · 2021-05-04T17:17:25.083

-2

after some consideration it seems to me that JIT compiling of java byte-code to native code may be the reason:
in the model I proposed, a native OS thread switching between execution of java threads would be a picking from its work queue a tuple <thread_stack, thread_instruction_pointer>. However because of JIT, java thread's stack basically is the same thing as backing OS thread's stack, which cannot be replaced just like that AFAIK.
So as I understand, the way I proposed to implement would only be possible if JVM was interpreting the bytcode each time and keeping java threads' stacks on its heap, which is not the case.

edited May 04 '21 at 17:17

answered May 04 '21 at 11:23

morgwai

2,513
4
25
31

1

An interpreter is a piece of (usually native) software too and can have an arbitrarily complex state to save and restore, just like other native or JIT compiled code. But you’re heading into the right direction when considering that a thread state is more complex than you initially thought. But another point is the requirement to replace every action that would block the native carrier thread (locking, but also operating system calls) with a non-blocking equivalent under the hood, to perform a lightweight task switch and switch back when the operation completed. – Holger May 17 '21 at 07:28
@Holger yes, fully agree: all the blocking operations would need to be replaces as you wrote, but that does not seem like a complex issue (just a lot of tedious work in the interpreter code). Anyway, the thread state is exactly as i thought, contrary to what you suggest, it's just that because of JIT compilation to native code, it's stored at much lower level (OS level), which makes difficult/impossible to switch it the way I first suggested. – morgwai May 17 '21 at 09:46

Why aren't Java threads both lightweight (like green threads) and multi-core capable? (backed by an internal native fixed size native thread pool)

1 Answers1

Linked