C/C++ indexed jump into a set of NOPs

Question

I want to produce delays with a one clock resolution so my thoughts are to have 255 NOPs one after the other and then jump to the last one minus the delay required. So 0 would jump past the last NOP, 1 to the last NOP, and 255 to the first NOP.

I've used indexed function calls before but can't find anything on indexed gotos like this. I also thought of using a switch statement but that seemed to have other instructions in the way.

Any suggestions gratefully received.

Have you looked at the code generated with your switch statement version? — 1201ProgramAlarm, Feb 24 '19 at 23:07
No - but it seems to index by about three clock ticks rather than one so presumably there's some other instructions being output. I'll go and look — Mike Bryant, Feb 24 '19 at 23:16
A quick attempt on godbolt seems to produce what you're looking for: https://godbolt.org/z/146A5A — Nick ODell, Feb 24 '19 at 23:57
Got the assembly output working on AVR-Studio at last, and it produces a vector table of rjmp <16bit address> pointing into the NOPs. So every additional NOP effectively takes 4 bytes rather than one. I was expecting the code to be a direct indexed jump into the NOPs, not via a table of indirect jumps. Code is switch(100-Batt_Percent) { case 100:__asm__ __volatile__("nop"); case 99: __asm__ __volatile__("nop"); .... case 1: __asm__ __volatile__("nop"); — Mike Bryant, Feb 25 '19 at 01:56

Nate Eldredge · Answer 1 · 2019-02-25T05:02:19.673

Nick ODell has a nice solution, but the compiler has no way of knowing that all of your cases will have exactly one byte of code. That will not be known until the assembler pass. So the compiler has to produce something that would work no matter how much code was produced in each of your cases, and the indirect jump table is really the only way to do that.

Thus, I think in order to get the "ideal" code, with one byte per nop, you're going to have to write the jump logic in assembly as well.

Here's what I came up with (for gcc / amd64 / gas on Linux). Here it is on godbolt.

#include <stdlib.h>

#define N 1000

#define xstr(s) str(s)
#define str(s) #s


void delay(unsigned ticks) {
  if (ticks <= N) {
    asm("movq $1f, %%rax \n"
    "addq %0, %%rax \n"
    "jmp *%%rax \n"
    "1: \n"
    ".rept " xstr(N) " \n"
    "nop \n"
    ".endr \n"
    : : "g" ((unsigned long)(N-ticks)): "ax");
  } else {
    abort();
  }
}

int main(void) {
  delay(4);
  return 0;
}

Note that it has to be compiled with -no-pie. If you want it to work as a position-independent executable, you probably need the trick like call 2f ; 2f: popq %rax to get the absolute program address into a register.

Of course, there's always the question whether the overhead of actually getting to this code will mess up the accuracy of your delay time...

C/C++ indexed jump into a set of NOPs

1 Answers1