1

I'm trying to write this inline assembly, which returns a random number using the rdrand instruction. The number is stored in the eax register, and then moved to the rng_num variable. But I get the error that is in the title.

    uint32_t rng_num;
    asm volatile("movl $100, %ecx\n\t"
                 "__trng_cpu_ret:\n\t"
                 "rdrand %%eax\n\t"
                 "jnc .__trng_cpu_end\n\t"
                 "loop __trng_cpu_ret\n\t"
                 ".__trng_cpu_fail:\n\t"
                 "movl $0, %%eax\n\t"
                 ".__trng_cpu_end:\n\t"
                 "ret\n\t"
                  : "=r" (rng_num)
                  :
                  :"%eax");

This is the original x86 Intel syntax code:

mov ecx, 100   ;number of retries
retry:
    rdrand eax
    jnc .done      ;carry flag is clear on success
    loop retry
.fail:
    ;no random number available
.done:
    ;random number is is EAX
Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
  • You need `%%ecx` instead of `%ecx`. That said, your inline assembly makes no sense. Why is there a `ret`? Why do you randomly write data to `eax` instead of using the register the compiler allocates for you? If you like, I can write an answer showing how to write this correctly. – fuz Apr 19 '19 at 22:04
  • 3
    Note that you could use `_rdrand32_step` from `immintrin.h` to avoid the inline assembly. – fuz Apr 19 '19 at 22:05
  • Didn't know the existance of these headers files. But, how I use `_rdrand32_step`? I've readed the header file code but I still confused about how to use it. An answer would be useful. –  Apr 19 '19 at 22:13
  • Refer to [this guide](https://software.intel.com/sites/landingpage/IntrinsicsGuide). Basically, it returns whether `rdrand` was succesful or not and if it was, writes the random number read to the object pointed to by the argument. – fuz Apr 19 '19 at 22:15
  • @fuz I forget to say that I am in an bare metal env, so when I try to use `immintrin.h` I get an compiling error, because that headers files needs `stdlib.h`. So I will need to use inline assembly. –  Apr 19 '19 at 22:22
  • 2
    See [Working example...](//stackoverflow.com/q/43389380) for an example of using the raw `__builtin_ia32_rdrand64_step` builtin, without the intrinsic wrapper. You do *not* need inline assembly. (But you should really figure out how you can `#include `, maybe a modified copy or manually defining a couple types and macros, because `immintrin.h` itself only actually defines inline wrappers for intrinsics, nothing that needs library calls. But really **https://gcc.gnu.org/wiki/DontUseInlineAsm**, especially if you think the slow `loop` instruction is a good idea. – Peter Cordes Apr 19 '19 at 22:34

1 Answers1

3

The correct answer, as mentioned by fuz and Peter in the comments, is to not use inline assembly.

But here are a couple ways to write this in inline assembly.

    uint32_t rng_num;
    int iterations = 100;
    asm volatile("1: rdrand %0\n\t"
                 "dec %1\n\t"
                 "ja 1b\n\t"    // jump if CF=0 (from rdrand) and ZF=0 (from dec)
                 : "=r" (rng_num), "+r"(iterations));

    // alternative that doesn't need partial-flag merging
    asm volatile("1: rdrand %0\n\t"
                 "jc 2f\n\t"
                 "dec %1\n\t"
                 "jnz 1b\n\t"
                 "2:\n\t"
                 : "=r" (rng_num), "+r"(iterations));

Notes:
- These rely on rdrand setting the destination to 0 when it fails.
- The ja instruction checks both the C flag from the rdrand instruction and also the Z flag from the dec. This may be less efficient than using two separate branches, as in the second example, depending on the cost of combining the two partial registers. I'm sure Peter can provide details. (Peter says: no partial flag stalls on CPUs new enough to have RDRAND, should be fine.)

Here's a list of problems in the code in the question:
- Doesn't use %% prefix on ecx register name.
- Uses ecx without a clobber.
- Checks CF=0 for success of rdrand instead of CF=1.
- Uses label names that are not local to the inline assembly.
- Doesn't use output register.
- Returns zero to indicate timeout instead of using a separate error indication. [Note, I didn't fix this one.]
- Uses loop instruction.
- Uses ret instruction within inline assembly.

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
prl
  • 11,716
  • 2
  • 13
  • 31
  • 2
    I think CPUs new enough to support `rdrand` avoid partial-flag stalls. At worst it will cost an extra uop that gets inserted automatically to merge flags for `ja` to read, but that shouldn't stall the front-end for more than 1 cycle at worst. And for execution it's off the critical path (because of branch prediction + speculative execution). RDRAND is quite slow, so RDRAND latency probably hides any front-end problem. (Like 16 uops but one per 460 cycle throughput on Skylake. IDK if it's faster if you aren't running it back-to-back, like maybe if each core keeps a buffer of ready RNG data?) – Peter Cordes Apr 20 '19 at 14:42
  • Nicely done. Personally, I am frustrated by the `iterations` parameter. What are you supposed to do when it fails? Intel gives no advice. Buy a new CPU? – vy32 Jul 28 '20 at 19:57
  • @vy32, If you’re using rdrand for a security application, either use a different source of randomness or the application should fail. When I use rdrand, I don’t even loop; i just execute rdrand once and use the result. For a non-security application, the possibility that rdrand could fail is insignificant. – prl Jul 29 '20 at 00:38
  • Thanks! We actually need on the order of 2E14 random bits, so we are calling RDRAND *a lot* on many machines. We are checking the CF and looping if it is not set. We're looping until it is set.. We are experimenting with an alternative application where we call RDSEED periodically and use it for a user-level AES-CTR-DRBG implementation, with gives us 12 CSPRNGs per processor rather than just 1. Can you recommend a reference for asm()? I need to learn the syntax; haven't dune much x86 assembler since the early 1980s... – vy32 Jul 29 '20 at 23:53
  • @vy32, please see the first sentence of my answer and the comments under the question. – prl Jul 30 '20 at 02:26
  • Thanks. I read https://gcc.gnu.org/wiki/DontUseInlineAsm. I tried to use the step function but the compiler hates the type of the argument; I can’t find a good simple example. I suppose I could just write an entire function and call it. That would be safer than in-line asm... – vy32 Jul 30 '20 at 02:35
  • 1
    @vy32,see [this](https://stackoverflow.com/q/43389380) and [this](https://software.intel.com/en-us/articles/intel-digital-random-number-generator-drng-software-implementation-guide). – prl Jul 30 '20 at 02:39
  • Thanks! I somehow missed that there is an `rdrand` tag. – vy32 Jul 30 '20 at 21:04