I am intimately familiar with neither Python nor Node.js, but I can help you out with the rest.
In my estimation, the easiest way to understand user threads is to understand how the kernel manages (kernel) threads in a single-core system. In such a system, there is only one hardware thread, i.e. only one thread can physically be in execution on the CPU at any given time. Clearly, then, in order to run multiple threads simultaneously, the kernel needs to multiplex between the threads. This is called time sharing: the kernel juggles between threads, running each for just a bit (usually in the order of, say, 10 ms) before changing to another thread. The time quantum given to each process is short enough so that it appears that the threads are being run in parallel, while in reality they are being run sequentially. This kind of apparent parallelism is called concurrency; true parallelism requires hardware support.
User threads are just the same kind of multiplexing taken one step further.
Every process initially starts with only one kernel thread, and it will not get more unless it explicitly asks the kernel. Therefore, in such a single-threaded process, all code is executed on the same kernel thread. This includes the user-space threading library responsible for creating and managing the user threads, as well as the user threads themselves. Creating user threads doesn't result to kernel threads being created - that is exactly the point of user-space threads. The library manages the user threads created by itself in much the same way that the kernel manages kernel threads; they both perform thread scheduling, which means that user-threads, too, are run in turns for a short time, one at a time.
You'll notice that this is highly analogous to the kernel thread scheduling described above: in this analogy, the single kernel thread the process is running on is the single core of the CPU, user threads are kernel threads and the user-space threading library is the kernel.
The situation remains largely the same if the process is running on multiple kernel threads (i.e. it has requested more threads from the kernel via a system call). User threads are just data structures local to the kernel thread they are run on, and the code executed on each user thread is simply code executed on the CPU in the context of the kernel thread; when a user thread is switched to another, the kernel thread essentially performs a jump and starts executing code in another location (indicated by the user thread's instruction pointer). Therefore, it is entirely possible to create multiple user threads from multiple kernel threads, although this would pretty much defeat the purpose of using user threads in the first place.
Here is an article about multithreading (concurrency) and multiprocessing (parallelism) in Python you might find interesting.
Finally, a word of warning: there is a lot of misinformation and confusion regarding kernel threads floating around. A kernel thread is not a thread that only executes kernel code (and threads executing kernel code aren't necessarily kernel threads, depending on how you look at it).
I hope this clears it up for you - if not, please ask for clarification and I'll try my best to provide it.