-1

I have the following code:

void palette(char* in, int* out, int* palette, int n) {
    for(int i = 0; i < n ; ++i)
    {
        int value = palette[in[i]];
        out[i] = value;
    }
}

Here's the compiler code generated https://godbolt.org/z/x3nvrW I'm wondering if it possible to generate better assembly, since in and out data are pointing to different memory locations. Is there some "aliasing" info I can inject into the code so that assembly generated is better ? (i can't really understand the output of godbolt)

edit : my question would actually be:

  • knowing that there is no aliasing, can I manually write faster assembly?
  • how can I make standard (or non standard) C++ code generate this assembly ?
lezebulon
  • 7,607
  • 11
  • 42
  • 73
  • What do you mean by "Better assembly"? – NotAProgrammer Jan 15 '21 at 14:40
  • 1
    The compiler has to assume `in`, `out`, and `palette` can all point to the same place. There is no way to know from the function alone to know if this is the case or not. – NathanOliver Jan 15 '21 at 14:41
  • @NotAProgrammer faster assembly ;) – lezebulon Jan 15 '21 at 14:44
  • 1
    You can use `__restrict__` (a compiler specific extension equivalent to `restrict` in C), but that doesn't change the assembly – Artyer Jan 15 '21 at 14:45
  • @NathanOliver I understand that :) My question is how can I change things (maybe has to be out of the function) so I can tell the program that they never point to the same place. Maybe it can be non-portable – lezebulon Jan 15 '21 at 14:45
  • I don't see how aliasing (knowing that there isn't any) could substantially affect the performance of this code. – 500 - Internal Server Error Jan 15 '21 at 14:46
  • @lezebulon AFAIK, there isn't a way to do that. Functions are compiled in isolation, so they don't really know about the outside world except for any global variables they may use. Inlining might remove the aliasing, but I'm not sure. – NathanOliver Jan 15 '21 at 14:49
  • @500-InternalServerError I would assume that the current assembly is slower because the compiler has to do extra work with aliasing – lezebulon Jan 15 '21 at 14:51
  • I've edited the quesiton – lezebulon Jan 15 '21 at 14:59

1 Answers1

1

The generated assembly looks tight enough that it will hit memory bandwidth limits. That means you can't fix is with a slight reorganization of instructions. Assembly functions can be faster if they can carefully pre-fetch memory and beat the CPU's prefetcher predictions.

In other words, you'd have to target one very specific CPU model and know a whole lot about its memory architecture. A typical C++ compiler generates code that works reasonably well everywhere, such as the assembly shown.

The fundamental reason is that caches are pretty effective at sorting out aliasing using parallel hardware, and can do so at a rate of many GB/second.

MSalters
  • 173,980
  • 10
  • 155
  • 350