2

Part 1 - why the code below checks st_inverse in the first place

The kiss_fft code has this branch inside a loop:

do {
    if(st->inverse) {
        Fout[m].r = scratch[5].r - scratch[4].i;
        Fout[m].i = scratch[5].i + scratch[4].r;
        Fout[m3].r = scratch[5].r + scratch[4].i;
        Fout[m3].i = scratch[5].i - scratch[4].r;
    }else{
        Fout[m].r = scratch[5].r + scratch[4].i;
        Fout[m].i = scratch[5].i - scratch[4].r;
        Fout[m3].r = scratch[5].r - scratch[4].i;
        Fout[m3].i = scratch[5].i + scratch[4].r;
    }
    ++Fout;
} while (--k); // Fout[] has k*4 elements.

Slightly reordered:

if(st->inverse) {
    Fout[m].r = scratch[5].r - scratch[4].i;
    Fout[m].i = scratch[5].i + scratch[4].r;
    Fout[m3].r = scratch[5].r + scratch[4].i;
    Fout[m3].i = scratch[5].i - scratch[4].r;
}else{
    Fout[m3].r = scratch[5].r - scratch[4].i;
    Fout[m3].i = scratch[5].i + scratch[4].r
    Fout[m].r = scratch[5].r + scratch[4].i;
    Fout[m].i = scratch[5].i - scratch[4].r;;
}

The two code blocks really differ only in their use of m and m3. But m and m3 are not changed inside the loop. Can I simply eliminate this inner-loop branch by swapping m and m3 ?

if(st->inverse) { swap(&m, &m3); }
do {
    Fout[m].r = scratch[5].r - scratch[4].i;
    Fout[m].i = scratch[5].i + scratch[4].r;
    Fout[m3].r = scratch[5].r + scratch[4].i;
    Fout[m3].i = scratch[5].i - scratch[4].r;
   ++Fout;
} while (--k);
Yun
  • 3,056
  • 6
  • 9
  • 28
MSalters
  • 173,980
  • 10
  • 155
  • 350
  • 1
    Will `m` and `m3` be used elsewhere in the code, after the loop? – Some programmer dude Oct 12 '21 at 15:12
  • 1
    @Someprogrammerdude: Fair question, but no. [source](https://github.com/mborgerding/kissfft/blob/master/kiss_fft.c). Besides, I could always swap them back. The driver for this kind of optimization is that a 1024-point FFT does 5 levels of these radix-4 butterflies, and these inner loops run 256 times per level. – MSalters Oct 12 '21 at 15:16
  • @Someprogrammerdude: There's indeed a different order between `+` and `-` in the first snippet, but also a different order between `m` and `m3`. That was the driver behind my idea; if I swap the indices (outside the loop) then the code inside the loop becomes equal – MSalters Oct 12 '21 at 15:27
  • @Someprogrammerdude: If `m` and `m3` are swapped in the `else` statements and each set of four statements is sorted, they are character-for-character identical; there is no difference in `+` or `-`. So they can differ in effect only if `Fout` overlaps `scratch`, which I expect is not the case. – Eric Postpischil Oct 12 '21 at 15:28
  • @EricPostpischil: Correct. scratch is a local variable, so the compiler can prove it does not overlap. VS2019 doesn't even bother putting `scratch[]` on the stack, it assigns AVX registers. – MSalters Oct 12 '21 at 15:34
  • The way to know whether you can do this is to implement the change and (1) ensure that the output didn't change for a range of random inputs, and (2) verify that the function actually becomes faster. To me it seems that it's much more useful to test this than to ask here and hope people guess correctly. :) – Cris Luengo Oct 12 '21 at 16:34

1 Answers1

0

I can indeed use that optimization. It's not necessary however with current-gen compilers that can use AVX. They'll eliminate that branch as well, using vpcmpeqd and vblendvps.

MSalters
  • 173,980
  • 10
  • 155
  • 350