5

GCC can suggest functions for attribute pure and attribute const with the flags -Wsuggest-attribute=pure and -Wsuggest-attribute=const.

The GCC documentation says:

Many functions have no effects except the return value and their return value depends only on the parameters and/or global variables. Such a function can be subject to common subexpression elimination and loop optimization just as an arithmetic operator would be. These functions should be declared with the attribute pure.

But what can happen if you attach __attribute__((__pure__)) to a function that doesn't match the above description, and does have side effects? Is it simply the possibility that the function will be called fewer times than you would want it to be, or is it possible to create undefined behaviour or other kinds of serious problems?

Similarly for __attribute__((__const__)) which is stricter again - the documentation states:

Basically this is just slightly more strict class than the pure attribute below, since function is not allowed to read global memory.

But what can actually happen if you attach __attribute__((__const__)) to a function that does access global memory?

I would prefer technical answers with explanations of actual possible scenarios within the scope of GCC / G++, rather than the usual "nasal demons" handwaving that appears whenever undefined behaviour gets mentioned.

Riot
  • 15,723
  • 4
  • 60
  • 67

1 Answers1

5

But what can happen if you attach __attribute__((__pure__)) to a function that doesn't match the above description, and does have side effects?

Exactly. Here's a short example:

extern __attribute__((pure)) int mypure(const char *p);

int call_pure() {
  int x = mypure("Hello");
  int y = mypure("Hello");
  return x + y;
}

My version of GCC (4.8.4) is clever enough to remove second call to mypure (result is 2*mypure()). Now imagine if mypure were printf - the side effect of printing string "Hello" would be lost.

Note that if I replace call_pure with

char s[];

int call_pure() {
  int x = mypure("Hello");
  s[0] = 1;
  int y = mypure("Hello");
  return x + y;
}

both calls will be emitted (because assignment to s[0] may change output value of mypure).

Is it simply the possibility that the function will be called fewer times than you would want it to be, or is it possible to create undefined behaviour or other kinds of serious problems?

Well, it can cause UB indirectly. E.g. here

extern __attribute__((pure)) int get_index();

char a[];
int i;
void foo() {
  i = get_index();  // Returns -1
  a[get_index()];  // Returns 0
}

Compiler will most likely drop second call to get_index() and use the first returned value -1 which will result in buffer overflow (well, technically underflow).

But what can actually happen if you attach __attribute__((__const__)) to a function that does access global memory?

Let's again take the above example with

int call_pure() {
  int x = mypure("Hello");
  s[0] = 1;
  int y = mypure("Hello");
  return x + y;
}

If mypure were annotated with __attribute__((const)), compiler would again drop the second call and optimize return to 2*mypure(...). If mypure actually reads s, this will result in wrong result being produced.

EDIT

I know you asked to avoid hand-waving but here's some generic explanation. By default function call blocks a lot of optimizations inside compiler as it has to be treated as a black box which may have arbitrary side effects (modify any global variable, etc.). Annotating function with const or pure instead allows compiler to treat it more like expression which allows for more aggressive optimization.

Examples are really too numerous to give. The one which I gave above is common subexpression elimination but we could as well easily demonstrate benefits for loop invariants, dead code elimination, alias analysis, etc.

yugr
  • 19,769
  • 3
  • 51
  • 96
  • Thanks for the detailed answer. Do you have any references for any of this information, though? If I'm understanding your explanation correctly, you're suggesting that pure and const attributes basically have the same effect, which is just elliding duplicate calls. Why, then, would there be a restriction on accessing global memory in a const function - and what could the possible effects be? I'm really interested in the specifics of how gcc deals with these attributes (as the documentation is sparse on this subject), rather than speculation about what these attributes *may* in theory do. – Riot Feb 06 '17 at 17:58
  • @Riot I don't think you'll find an exhaustive documentation about this. Many optimization passes in GCC take const/pure info into account (grepping over sources results in 38 files in the middle-end, including loop invariants, dead code elimination, etc.). Basically pure/const attributes inform compiler that calls won't have side effects which can then be used for huge number of optimizations. I only gave one simple example of these - common subexpression elimination (which is a classical optimization technique). – yugr Feb 06 '17 at 18:06
  • @Riot "Why ... restriction on accessing global memory in a const function" - well, it's not a restriction but rather one more hint to compiler that allows it to optimize calls to const functions even more aggressively (i.e. move them past statements which can potentially modify global memory). Again, examples are really too numerous to give, basically _any_ compiler optimization will benefit from const/pure attributes. – yugr Feb 06 '17 at 18:08
  • Thanks for the clarification. So the documentation's statement "function is not allowed to read global memory" really just means that the worst that might happen is such memory accesses may occur at a different time, on the wrong side of memory fences etc? It just strikes me slightly odd to specify "not allowed" in this way, without gcc attempting to enforce that in any way, or warning if it occurs - and I'm especially curious about the possible bugs introduced by accidentally attaching these attributes to functions which don't fulfil the specified obligations in the documentation. – Riot Feb 06 '17 at 18:22
  • @Riot "worst that might happen is such memory accesses may occur at a different time" - the worst is that compiler moves pure function call before write to global memory that it's supposed to read. This will most likely cause pure function to produce incorrect results. – yugr Feb 06 '17 at 18:28
  • @Riot "without gcc attempting to enforce that in any way" - that's a valid point. But note that many performance-related features like strict aliasing or restrict annotations lacked warnings about improper usage for years, until users were too frustrated and forced RedHat to implement them. Pure/const functions are not yet there (not used that often so users don't feel pain that often). – yugr Feb 06 '17 at 18:30
  • @Riot "possible bugs introduced by accidentally attaching these attributes to functions" - you are risking plain UB here, including memory errors and related security vulnerabilities (see the `get_index` example above). – yugr Feb 06 '17 at 18:33
  • Thanks, you're right about other optimisations lacking appropriate warnings too. And, absolutely agree about the risk of knock-on UB - I can definitely see the possible ramifications. I was primarily curious about whether misusing these attributes could cause more noticeable effects such as immediate segfaults - if their only effects are potentially unintended execution order, then that makes them all the more dangerous if potentially misused, as a bug may not surface immediately in many cases. It's interesting to think about. – Riot Feb 06 '17 at 18:38