Hardware thread is what allows you to actually run things in parallel (which is not the same as concurrently). These corresspond to number of your CPU cores (with nuances like hyperthreading, which can double the number of cores).
On top of that are OS (kernel) threads. Its an abstraction provided by your OS. The OS will map them to hardware threads. It does this via internal scheduler, and we have little to no control over that. Note that in theory there may be arbitrarily many OS threads (if there are not enough cores to handle them they simply wait for CPU), although the price for so called context switch limits it to few thousands, maybe more.
User threads (a.k.a. green threads, coroutines, etc. they have many names) is an abstraction provided by your software (e.g. programming language and its runtime). They run on top of OS threads, and are mapped to them via internal (but in user space) scheduler. They tend to perform better than OS threads (especially with i/o bound tasks) because they have lower context switch overhead, plus they can take advantage of async apis (e.g. nonblocking sockets) without spawning OS threads (which is costly as well). Since they are lightweight, you can spawn lots of them. Some people claim to run millions of such threads at a time. I've seen tens of thousands without issues.
I've never seen the term "software thread" though. But depending on context it means either user or kernel thread. Unlikely it means anything else.
Btw no real code can run without some OS support. It can be limited, if for example you don't want things to run in parallel. But as soon as you want true parallelism there is no escape from OS threads. The internal scheduler for user threads have to spawn OS threads and map user threads to them in some way. Although typically it is an invisible implemention detail.