Back in java 1.1 all threads were running on a single core (not taking advantage of machine's multiple cores/CPUs) and scheduled by JVM in user space (so called green threads).
Around Java 1.2 / 1.3 (depending on the underlying OS), a change was made and Java Thread objects were mapped to OS threads (pthreads in case of Linux), which takes full advantage of multiple cores, but OTOH creating a thread became very expensive in terms of memory (because of crazy huge initial stack size of OS threads), which heavily limits the number of concurrent requests that a single machine can handle in thread-per-request model. This required server-side architectures to switch to the asynchronous model (non-blocking I/O package was introduced, AsyncContext was added to servlet API, etc) which has been continuously confusing several generations of Java server-side devs up to this day: at first most APIs look like they were intended for thread-per-request model and one needs to carefully read API documentations to find async capabilities bootstrapped to them from a side.
Only recently project Loom finally aims to deliver lightweight threads that are backed by a thread pool (a Java thread pool of "old-style" Java threads, which in turn map to OS threads) and thus combining the advantages: cheap to create in large quantities threads that do utilize multiple cores and can be lightheartedly suspended on blocking operations (such as I/O etc).
Why is this happening only now, after 20 years, instead of right away in Java 1.3? ie: why Java threads were made to map 1-1 to OS threads instead of being backed (executed) by JVM's internal thread pool of OS threads of fixed size corresponding to available CPU cores?
Is it difficult to implement in JVM maybe?
It seems not much more complex that all the asynchronous programming that java server-side devs have been forced to do for the last 20 years and what C/C++ devs have always been doing, but maybe I'm missing something.
Another possibility is that there is some blocking obstacle in architectural design of JVM that prevents it from being implemented this way.
UPDATE:
Project Loom's architecture design info was updated according to comments: many thanks!