LFENCE is already strong enough (in practice at least) to stop the CPU from actually looking at load instructions after it, but the CPU is free to speculatively load for other reasons.
Stopping that would require some kind of lookahead past the barrier to find out what addresses to disable HW prefetch for. That's not practical at all. CPUID or other serializing instructions aren't any stronger than LFENCE for stopping load prefetches.
The CPU is always allowed to speculatively fetch from memory in WB and WT regions / pages. Intel's optimization manual documents some stuff about the hardware prefetchers in some of their CPU models, so you could in practice avoid doing things before CPUID that are likely to trigger such prefetches.
(WC is weakly-ordered uncacheable+write-combining, but speculative fetch is also allowed there on paper. In real life that probably only happens in the shadow of a branch mispredict, not HW prefetch. It's not normally cacheable like WB and WT.)
If you're microbenchmarking a real CPU, the trick to some kinds of microbenchmarks is to find an access pattern that won't trigger HW prefetching, or to disable the HW prefetchers.
Maybe in theory you could have an x86 CPU that looked ahead in the instruction stream for load/store instructions and speculatively prefetched for them, separate from actually executing them (which Intel's definition of LFENCE would block). I don't think anything would stop it from doing that across CPUID either.
Probably nobody will design such a CPU, because
- It's not worth the transistors / power. Starting prefetch as soon as regular out-of-order execution can get to it is already good enough. And except for absolute / RIP-relative addresses or direct jumps, you'd need register values from the OoO core to get a useful prefetch address.
- Looking past LFENCE / CPUID is perverse; they're rare enough that defeating speculative "execution" of loads past them is part of the point, in the age of Spectre.