There was a time when the difference between a CPU and GPU was very clear, but it has become increasingly blurred over the years both from the former side (SIMD vector instructions) and even more from the latter, to the point where not only are GPU's doing general computation, but Nvidia talks about using the RISC-V instruction set in their next one.
Yet even if they can now in principle compute the same things, in practice they are not good at the same things. GPU's are much faster than CPUs on some workloads, but slower on others. Clearly this is partly because the implementations optimize for different things, but is presumably also partly due to architectural differences.
In particular, presumably GPU's omit some features of CPUs, which for some workloads would be baggage the omission of which lets GPU's run much faster, and for other workloads means GPU's cannot run existing code except by slow and awkward workarounds.
I'm interested in just what architectural features GPU's still do not have, to account for their distinctive characteristics relative to CPUs.
Guesses:
- IEEE 754 semantics
- Unaligned memory access
- Transparent caches (i.e. on-chip memory that does not have to be managed explicitly by software)
- Cache-coherent SMP
- Paged virtual memory
- Out of order execution
Are any of those on target? What else have I not thought of? Which features are the most important (both in saving transistors/nanojoules per flop on the one hand, and making GPU's unable to run CPU workloads on the other hand)?