38

Consider the following C++17 code:

#include <iostream>
int read;
int main(){
    std::ios_base::sync_with_stdio(false);
    std::cin >> read;
}

It compiles and runs fine on Godbolt with GCC 11.2 and Clang 12.0.1, but results in runtime error if compiled with a -static key.

As far as I understand, there is a POSIX(?) function called read (see man read(2)), so the example above actually invokes ODR violation and the program is essentially ill-formed even when compiled without -static. GCC even emits warning if I try to name a variable malloc: built-in function 'malloc' declared as non-function

Is the program above valid C++17? If no, why? If yes, is it a compiler bug which prevents it from running?

yeputons
  • 8,478
  • 34
  • 67
  • 2
    @Someprogrammerdude this is where the original question comes from, actually. Great source of random C++ riddles. – yeputons Oct 03 '21 at 11:27
  • 2
    @Someprogrammerdude also, whether `sync_with_stdio(0)` is needed or not depends a lot on the contest. For example, lots of local Russian ICPC contests have very tight time limits and you'd better not use `` at all because it's slow. Obviously, it's a combination of large I/O and specific compilers/default compilation flags used on such competitions. – yeputons Oct 03 '21 at 11:30
  • 3
    Unfortunately, a great source of misconceptions and utterly bad code examples, too. The harm such websites are doing to C++ beginners is immense. This has nothing to do with professional programming. – Evg Oct 03 '21 at 11:32
  • 2
    @Evg Yes, kinda. However, in this case this line is _required_ for the problem to occur. It's not illegal in itself, hence the question. – yeputons Oct 03 '21 at 11:33
  • (That was not to criticize this question in any way.) – Evg Oct 03 '21 at 11:34
  • @BarmakShemirani `fread` is in the standard, so I would be less surprised, I was thinking about [`read(2)`](https://man7.org/linux/man-pages/man2/read.2.html), will clarify. – yeputons Oct 03 '21 at 11:36
  • This are the rules for naming identifiers in c++ https://en.cppreference.com/w/cpp/language/identifiers). So yes you can name variables "read", "fread", "min", "max". However any nameclashes that occur as a result of using libraries are yours to solve. That's why namespaces are recommended : https://en.cppreference.com/w/cpp/language/namespace – Pepijn Kramer Oct 03 '21 at 11:42
  • Yes, it is allowed. Neither `read` nor `malloc` are reserved identifiers according to the standard. Of course, in some contexts (e.g. where `` is included, which declares `malloc()`) having a variable named `malloc` would be problematical, just as it would be problematical in some contexts having both a user-declared function named `foo()` and a variable named `foo`. – Peter Oct 03 '21 at 11:46
  • I see. If you add `#include ` do you get a compiler error for redefinition of `read`? – Barmak Shemirani Oct 03 '21 at 11:48
  • @BarmakShemirani Yes, I do get an error in that case. However, one doesn't typically include all possible headers in a program. – yeputons Oct 03 '21 at 11:50
  • There can be an argument made putting all of your program into a namespace to avoid clashes with external libraries. – Richard Critten Oct 03 '21 at 11:56
  • 6
    The *global namespace* is the wild, wild west. The program above is valid C++17 **if** it has no ODR violations. Does not matter if the ODR is due to the code you supplied or due to the code the platform supplied. – Eljay Oct 03 '21 at 12:09
  • 1
    @Peter *Yes, it is allowed* Not per POSIX, and this question is tagged `posix` (at the time I write this...). POSIX reserves some identifiers regardless of header inclusion in [**2.2.2 The Name Space**](https://pubs.opengroup.org/onlinepubs/9699919799/functions/V2_chap02.html#tag_15_02_02): "... (skip a lot) ... The following identifiers are reserved regardless of the inclusion of headers: ... malloc ..." Interestingly, `read` is not on that list. – Andrew Henle Oct 03 '21 at 12:25
  • @AndrewHenle: `malloc` is a function from C standard library which is included in C++ standard library. And in C++ an include file is allowed to load symbols from other include files so a cautious programmer should never use a symbol defined anywhere in the standard library. But **according to the standards** `read` is not one of them... – Serge Ballesta Oct 03 '21 at 12:40
  • @AndrewHenle - The question is also tagged C++ (still is, as I write this), and that was the context of my comment. The C++ standard does not require an implementation to comply with, or enforce compliance with, any of the POSIX standards/specifications. And, of course, POSIX compliance is a property of operating systems related to compatibility with unix or "unix-like" systems, rather than a property of toolchains (which aim for compliance with relevant language standards/specifications) or user software (which may or may not assume a unix-compatible host). – Peter Oct 03 '21 at 12:54
  • For what it's worth, Visual C crashes only when you declare `read` as `extern "C"`, as I would expect (and link statically). – Peter - Reinstate Monica Oct 04 '21 at 11:54
  • 1
    @Eljay The "no matter where a conflicting definition comes from" is correct, and I was never aware of the implications. It is pretty scary: How on Earth am I to know what names libraries define *that I never explicitly link to*!? – Peter - Reinstate Monica Oct 04 '21 at 12:00
  • @Peter-ReinstateMonica that is why I have started putting my programs into their own unique namespace and only having `main` (or other entry point) in the global namespace. There is nothing I can do wrt to 3rd-party libraries conflicting with each other. – Richard Critten Oct 04 '21 at 13:26
  • @RichardCritten Seems so ... extravagant. Also scary: It's a runtime error; in a less-used code path it may stay undetected for a while. – Peter - Reinstate Monica Oct 04 '21 at 13:28
  • @Peter-ReinstateMonica The comment from Eljay above expressed it most clearly. Not writing code in the global namespace should be added to the many "best-practice" guides, lint etc that are out there, – Richard Critten Oct 04 '21 at 13:30

2 Answers2

17

The code shown is valid (all C++ Standard versions, I believe). The similar restrictions are all listed in [reserved.names]. Since read is not declared in the C++ standard library, nor in the C standard library, nor in older versions of the standard libraries, and is not otherwise listed there, it's fair game as a name in the global namespace.

So is it an implementation defect that it won't link with -static? (Not a "compiler bug" - the compiler piece of the toolchain is fine, and there's nothing forbidding a warning on valid code.) It does at least work with default settings (though because of how the GNU linker doesn't mind duplicated symbols in an unused object of a dynamic library), and one could argue that's all that's needed for Standard compliance.

We also have at [intro.compliance]/8

A conforming implementation may have extensions (including additional library functions), provided they do not alter the behavior of any well-formed program. Implementations are required to diagnose programs that use such extensions that are ill-formed according to this International Standard. Having done so, however, they can compile and execute such programs.

We can consider POSIX functions such an extension. This is intentionally vague on when or how such extensions are enabled. The g++ driver of the GCC toolset links a number of libraries by default, and we can consider that as adding not only the availability of non-standard #include headers but also adding additional translation units to the program. In theory, different arguments to the g++ driver might make it work without the underlying link step using libc.so. But good luck - one could argue it's a problem that there's no simple way to link only names from the C++ and C standard libraries without including other unreserved names.

(Does not altering a well-formed program even mean that an implementation extension can't use non-reserved names for the additional libraries? I hope not, but I could see a strict reading implying that.)

So I haven't claimed a definitive answer to the question, but the practical situation is unlikely to change, and a Standard Defect Report would in my opinion be more nit-picking than a useful clarification.

aschepler
  • 70,891
  • 9
  • 107
  • 161
  • 6
    [GLIBC manual](https://www.gnu.org/software/libc/manual/html_node/Reserved-Names.html) says: _"The names of all library types, macros, variables and functions that come from the ISO C standard are reserved unconditionally; your program may not redefine these names."_ – Ruslan Oct 03 '21 at 23:06
  • 1
    Also, [reserved.names-3](https://timsong-cpp.github.io/cppwp/n4659/reserved.names#extern.names-3) seems to say the same. – Ruslan Oct 03 '21 at 23:10
  • 2
    @Ruslan: True, but note that `read` is _not_ a C standard function (unlike, for example, `fread` - or `malloc`, come to that). – psmears Oct 04 '21 at 13:24
  • @psmears indeed, didn't think of it. – Ruslan Oct 04 '21 at 13:26
  • Related: If you want to use your own function called `malloc`, you also need the GCC option `-fno-builtin-malloc` to remove the implicit definition of it as an alias for the `__builtin_malloc`. (That's the mechanism by which GCC is able to inline memcpy, or for malloc to know that the returned pointer doesn't alias anything, and is aligned: [What improvements does GCC's \`\_\_builtin\_malloc()\` provide over plain \`malloc()\`?](https://stackoverflow.com/q/26009570).) @Ruslan. This is normally relevant in kernels, which don't link glibc, and some use `-fno-builtin` to disable everything. – Peter Cordes Oct 04 '21 at 15:56
6

Here is some explanation on why it produces a runtime error with -static only.

The https://godbolt.org/z/asKsv95G5 link in the question indicates that the runtime error with -static is Program returned: 139. The output of kill -l in Bash on Linux contains 11) SIGSEGV (and 128 + 11 = 139), so the process exits with fatal signal SIGSEGV (Segmentation fault) indicating invalid memory reference. The reason for that is that the process tries to run the contents (4 bytes) of the read variable as machine code. (Eventually std::cin >> ... calls read.) Either somethings fails in those 4 bytes accidentally interpreted as machine code, or it fails because the memory page containing those 4 bytes is not executable.

The reason why it succeeds without -static is that with dynamic linking it's possible to have multiple symbols with the same name (read): one in the program executable, and another one in the shared library (libc.so.6). std::cin >> ... (in libstdc++.so.6) links against libc.so.6, so when the dynamic linker tries to find the symbol read at program load time (to be used by libstdc++.so.6), it will look at libc.so.6 first, finding read there, and ignoring the read symbol in the program executable.

pts
  • 80,836
  • 20
  • 110
  • 183