Nick ODell has a nice solution, but the compiler has no way of knowing that all of your cases will have exactly one byte of code. That will not be known until the assembler pass. So the compiler has to produce something that would work no matter how much code was produced in each of your cases, and the indirect jump table is really the only way to do that.
Thus, I think in order to get the "ideal" code, with one byte per nop, you're going to have to write the jump logic in assembly as well.
Here's what I came up with (for gcc / amd64 / gas on Linux). Here it is on godbolt.
#include <stdlib.h>
#define N 1000
#define xstr(s) str(s)
#define str(s) #s
void delay(unsigned ticks) {
if (ticks <= N) {
asm("movq $1f, %%rax \n"
"addq %0, %%rax \n"
"jmp *%%rax \n"
"1: \n"
".rept " xstr(N) " \n"
"nop \n"
".endr \n"
: : "g" ((unsigned long)(N-ticks)): "ax");
} else {
abort();
}
}
int main(void) {
delay(4);
return 0;
}
Note that it has to be compiled with -no-pie
. If you want it to work as a position-independent executable, you probably need the trick like call 2f ; 2f: popq %rax
to get the absolute program address into a register.
Of course, there's always the question whether the overhead of actually getting to this code will mess up the accuracy of your delay time...