2

The __restrict in the code below completely unwinds the loop and shortens the assembly by more than a half. But what does it mean and how should it be correctly used?

I did research before asking... I found this. But alas, I do not understand it.

// Compile with -O3 -march=native to see autovectorization
void maxArray(double* __restrict x, double* __restrict y) {
    for (int i = 0; i < 65536; i++) {
        if (y[i] > x[i]) x[i] = y[i];
    }
}

Godbolt's Compiler Explorer

Daniel Langr
  • 22,196
  • 3
  • 50
  • 93
Alasdair
  • 13,348
  • 18
  • 82
  • 138
  • Your code might be seen as C code, and the C standard [n1570](http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1570.pdf) says something about `restrict`. I do recommend looking inside some C standard. See also [this website](https://en.cppreference.com/w) – Basile Starynkevitch Apr 16 '21 at 06:16
  • It's definitely C++, the keyword is `__restrict` not the same as `restrict` – Alasdair Apr 16 '21 at 06:17
  • Yes, but the C++ standard [n3337](http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2012/n3337.pdf) does not mention `__restrict` or `restrict`. It is a [GCC](http://gcc.gnu.org/) language extension documented [here](https://gcc.gnu.org/onlinedocs/gcc/Restricted-Pointers.html) – Basile Starynkevitch Apr 16 '21 at 06:18
  • 2
    `__restrict` is a compiler specific extension, that is (more or less) related to the `restrict` keyword in C (C99 and later). It's purpose is to say that two pointers are not aliased. In your sample code, it means `x` and `y` point at different things (each is treated as an array with 65535 elements, and those arrays cannot overlap). Microsoft also support `__restrict` - info at https://learn.microsoft.com/en-us/cpp/cpp/extension-restrict?view=msvc-160 – Peter Apr 16 '21 at 06:20
  • @Peter Can you please elaborate on that? It's just saying that the pointers don't point to the same memory space? Can't the compiler work that out by itself? – Alasdair Apr 16 '21 at 06:24
  • If it gives such a big improvement seems like it'd be good practice to include that everywhere where it's true. – Alasdair Apr 16 '21 at 06:25
  • It is an *annotation* given by the programmer to "promise" than pointers and the pointed memory zones don't "overlap". A good compiler might not be able to *always* deduce that automatically (because of [Rice's theorem](https://en.wikipedia.org/wiki/Rice's_theorem)...) – Basile Starynkevitch Apr 16 '21 at 06:26
  • re: *"Can't the compiler work that out by itself?"* - How can it in a seperate compilation model? That function can be compiled in some object and then linked in a lot of other places. The compiler can't see all of them in advance to choose how the loop is to be treated. It needs that extra information. – StoryTeller - Unslander Monica Apr 16 '21 at 07:07
  • 4
    @Alasdair Try to think like a compiler. If you want to process some data using vectorizaion (SIMD) instructions and optimization techniques like loop unrolling, there generally cannot be any dependencies between iterations. Which is basically what `restrict` says. Otherwise, you need to assume that writing into `x[i]` can overwrite `y[i]` in the next iteration. – Daniel Langr Apr 16 '21 at 07:09
  • 1
    @Alasdair - `restrict` allows the compiler to assume, when compiling the function, to *assume* that the areas of memory will not overlap. It can then optimise the function accordingly (no need to check for overlaps, no need to reorder to allow for the possibility of overlaps). When compiling the caller, it requires the compiler to check if the memory overlaps (or, if the caller has been passed the pointers, it requires them to also be `restrict`) – Peter Apr 16 '21 at 12:41

1 Answers1

5

Imagine you declare some static double array[100000]; then your main is calling maxArray(array, array + 17);

Without the restrict annotation (or GCC extension), the compiler is not allowed to strongly unroll the loop (because the two array slices are overlapping)

With the restrict annotation you as a programmer promises that this would never happen (so you won't do maxArray(array, array + 17); in such a main), and then the compiler can optimize more agressively

There is a similar difference (for C) between memcpy and memmove and an optimizing compiler would generate different code for them.

Be aware of the Rice's theorem, which states theoretical limitations related to these issues. A theoretical framework for agressive optimizations could be abstract interpretation.

If you use GCC (you may look into the generated assembler code produced with g++ -Wall -O3 -S -fverbose-asm) you could with your GCC plugin and a lot of efforts improve the optimizations. You also could use GCC developer options to understand various optimizations, and since GCC is free software, you can study and improve its source code. Budget months of effort for this.

Consider using, if so allowed, static analysis tools for C or C++ code like Frama-C or the Clang static analyzer.

Consider using, in addition of your debugger (e.g. GDB and its watchpoints), if so allowed, dynamic instrumentation techniques like valgrind and the address sanitizer. They do slow down a lot your executable!

Basile Starynkevitch
  • 223,805
  • 18
  • 296
  • 547