What do modern GPUs not have?

Question

There was a time when the difference between a CPU and GPU was very clear, but it has become increasingly blurred over the years both from the former side (SIMD vector instructions) and even more from the latter, to the point where not only are GPU's doing general computation, but Nvidia talks about using the RISC-V instruction set in their next one.

Yet even if they can now in principle compute the same things, in practice they are not good at the same things. GPU's are much faster than CPUs on some workloads, but slower on others. Clearly this is partly because the implementations optimize for different things, but is presumably also partly due to architectural differences.

In particular, presumably GPU's omit some features of CPUs, which for some workloads would be baggage the omission of which lets GPU's run much faster, and for other workloads means GPU's cannot run existing code except by slow and awkward workarounds.

I'm interested in just what architectural features GPU's still do not have, to account for their distinctive characteristics relative to CPUs.

Guesses:

IEEE 754 semantics
Unaligned memory access
Transparent caches (i.e. on-chip memory that does not have to be managed explicitly by software)
Cache-coherent SMP
Paged virtual memory
Out of order execution

Are any of those on target? What else have I not thought of? Which features are the most important (both in saving transistors/nanojoules per flop on the one hand, and making GPU's unable to run CPU workloads on the other hand)?

Here is an interesting link: http://cva.stanford.edu/classes/cs99s/papers/myer-sutherland-design-of-display-processors.pdf (Myer, T. H., and Ivan E. Sutherland. "On the design of display processors." Communications of the ACM 11.6 (1968): 410-414.) — ddemidov, Nov 13 '20 at 18:36
@ddemidov: I doubt any paper from 1968 has anything relevant to say about modern highly-parallel GPUs, vs. the different style of SIMD (short vectors) used on modern CPUs. Neither way of exploiting data parallelism was even on the horizon back then, AFAIK. The paper you linked seems to be proposing that programmable GPUs could be useful (e.g. for handling light-pen input), which is very different from the workloads current GPUs were originally designed for (computing 3D graphics, and running shader programs on each pixel.) It has barely any (maybe zero) relevance to this question. — Peter Cordes, Nov 16 '20 at 21:18

score 2 · Answer 1 · answered Nov 13 '20 at 11:59

2

Branches are a big one. GPU instruction sets have some way to deal with if/then/else, but typically they handle conditionals by bitmasking a particular result with ones or zeroes.

answered Nov 13 '20 at 11:59

Davislor

14,674
2
34
49

What do modern GPUs not have?

1 Answers1