How to optimize for dividing a constant dividend?

Question

Optimization for divided by a constant is well optimized by gcc, as is well known :)

Now I wonder how dividing a constant is optimized. gcc does not help me out, and so does clang.

Maybe I am not good at searching such information, but I cannot find a material about optimization for dividing constant. (In contrast, division by constant is well introduced.)

#include <stdio.h>

int f(int x)
{
    // can I optimize off the idiv opcode here?
    return 33659/x;
}

int main()
{
    int x;
    scanf("%d", &x);
    printf("%d", f(x));
    return 0;
}

EDIT1:

#include <stdio.h>

#define DIVIDEND 33

void f ( unsigned int* arr, int n )
{
    for ( int i = 0; i < n ; i++ )
    {
        arr[i] = DIVIDEND / arr[i];
    }
}

int main()
{
    const int n = 1024;
    unsigned int buf[n];
    for ( int i = 0; i < n; i++ )
    {
        scanf ( "%u", buf + i );
    }
    f ( buf, n );
    for ( int i = 0; i < n; i++ )
    {
        printf ( "%d", buf[i] );
    }
    return 0;
}

Optimized with clang -O3 -march=native div.c -o div only unrolls the loop, whilst:

#include <stdio.h>

#define DIVIDEND 33
#define DIVISOR DIVIDEND

void f ( unsigned int* arr, int n )
{
    for ( int i = 0; i < n ; i++ )
    {
        //arr[i] = DIVIDEND / arr[i];
        arr[i] = arr[i] / DIVISOR;
    }
}

int main()
{
    const int n = 1024;
    unsigned int buf[n];
    for ( int i = 0; i < n; i++ )
    {
        scanf ( "%u", buf + i );
    }
    f ( buf, n );
    for ( int i = 0; i < n; i++ )
    {
        printf ( "%d", buf[i] );
    }
    return 0;
}

using the same command line will yield a pile of terrifying AVX2 code. (Remember that division by constant is rewritten into shift+mul+add, which can be vectorized!)

EDIT2: Thank @user2722968 ! Applying RCPPS will make the program faster.

Here is my experimental implementation using RCPPS for fast constant-dividend division:

https://github.com/ThinerDAS/didactic-spoon/blob/master/div.c

However, I am not sure how to make it more accurate without large overhead.

Neither gcc nor clang showing any optimizations is a good hint that this can in fact not be optimized very well (or at all). — Ben Steffan, Jul 01 '17 at 16:10
It would be interesting to know what optimisations OP thinks might be possible in this case. — High Performance Mark, Jul 01 '17 at 16:25
The are optimizing it if the dividend is 0 :P. I think that's the only case that can be optimized. — Petr Skocik, Jul 01 '17 at 16:28
@PSkocik the 1/x may be worth of optimization too. But anyway, I can imagine possible optimizations if the `x` is constrained enough, like [1, ... 10] only, etc. — Ped7g, Jul 01 '17 at 16:30
The only optimisation is to declare/define the function as `static` , allowing it to be inlined. — wildplasser, Jul 01 '17 at 17:26
With [Compiler Explorer](https://godbolt.org/) different compilers can be tried and no optimizations are observed. — chus, Jul 01 '17 at 23:43

score 1 · Answer 1 · answered Jul 01 '17 at 20:25

1

If you can trigger a really good optimization for "divided by" then you might benefit from computing the reciprocal of x/33659 using the RCPPS instruction (which does use SSE/AVX).

answered Jul 01 '17 at 20:25

user2722968

13,636
2
46
67

RCPPS is a good hint! This instruction is a very rough approximation, and is not reliable, however it is very fast. Hard to trigger the instruction :( I will try it out. – Thiner Jul 02 '17 at 06:31

How to optimize for dividing a constant dividend?

1 Answers1

Linked