10

Consider the following program with two compilation units.


// a.hpp

class A {
  static const char * get() { return "foo"; }
};

void f();

// a.cpp

#include "a.hpp"
#include <iostream>

void f() {
  std::cout << A::get() << std::endl;
}

// main.cpp

#include "a.hpp"
#include <iostream>

void g() {
  std::cout << A::get() << std::endl;
}

int main() {
  f();
  g();
}

It is quite common to need to create global string constants for some reason or other. Doing this in the totally naive way causes linker problems. Usually, people put a declaration in the header and a definition in a single compilation unit, or use macros.

I had been under the impression that this way of doing it (shown above) with a function was "okay", because it is an inline function and the linker eliminates any duplicate copies that are produced, and programs written using this pattern seem to work fine. However, now I have my doubts about whether it's actually legitimate.

The function A::get is odr-used in two different translation units, but it is implicitly inline since it is a class member.

In [basic.def.odr.6] it states:

There can be more than one definition of a ... inline function with external linkage (7.1.2)... in a program provided that each definition appears in a different translation unit, and provided the definitions satisfy the following requirements. Given such an entity named D defined in more than one translation unit, then
- each definition of D shall consist of the same sequence of tokens; and
- in each definition of D, corresponding names, looked up according to 3.4, shall refer to an entity defined within the definition of D, or shall refer to the same entity, after overload resolution (13.3) and after matching of partial template specialization (14.8.3), except that a name can refer to a non-volatile const object with internal or no linkage if the object has the same literal type in all definitions of D, and the object is initialized with a constant expression (5.19), and the object is not odr-used, and the object has the same value in all definitions of D; and
- in each definition of D, corresponding entities shall have the same language linkage; and
- ... (more conditions that don't seem relevant)

If the definitions of D satisfy all these requirements, then the program shall behave as if there were a single definition of D. If the definitions of D do not satisfy these requirements, then the behavior is undefined.

In my example program, the two definitions (one in each translation unit) each correspond to the same sequence of tokens. (This is why I originally thought it was okay.)

However, it's not clear that the second condition is satisfied. Because, the name "foo" might not correspond to the same object in the two compilation units -- it's potentially a "different" string literal in each, no?

I tried changing the program:

  static const void * get() { return static_cast<const void*>("foo"); }

so that it prints the address of the string literal, and I get the same address, however I'm not sure if that's guaranteed to happen.

Does it fall under "... shall refer to an entity defined within the definition of D"? Is "foo" considered to be defined within A::get here? It might seem so, but as I understand informally, string literals ultimately cause the compiler to emit some sort of global const char[] which lives in a special segment of the executable. Is that "entity" considered to be within A::get or is that not relevant?

Is "foo" even considered a "name", or does the term "name" refer only a valid C++ "identifier", like could be used for a variable or function ? On the one hand it says:

[basic][3.4]
A name is a use of an identifier (2.11), operator-function-id (13.5), literal-operator-id (13.5.8), conversion- function-id (12.3.2), or template-id (14.2) that denotes an entity or label (6.6.4, 6.1).

and an identifier is

[lex.name][2.11]
An identifier is an arbitrarily long sequence of letters and digits.

so it seems like a string literal is not a name.

On the other hand in section 5

[expr.prim.general][5.1.1.1]
A string literal is an lvalue; all other literals are prvalues.

Generally, I thought that lvalues have names.

Chris Beck
  • 15,614
  • 4
  • 51
  • 87
  • 2
    Since this is tagged [language-lawyer], I'll leave it to be answered by someone who'll track down the relevant spec cites; but FWIW, "name" means basically "a thing that can be declared" -- like, a variable-name, a function-name, a class-name, etc. -- and string literals are certainly *not* names. (See http://en.cppreference.com/w/cpp/language/identifiers#Names for a slightly more formal enumeration.) – ruakh Jun 18 '16 at 21:27
  • It's an academic question. As such it's of great interest to those currently creating the new standard, who apparently can't think of the simple solution of plugging the hole if there is one (I think that process was fouled somewhere between C++11 and C++14). In practice an `inline` function produces a discardable linker record, and the linker just selects one single definition, which, in practice, guarantees the uniqueness of the string -- at least if machine code inlining optimization respects that behavior, as it does. – Cheers and hth. - Alf Jun 18 '16 at 21:27
  • If you want to look up the considerations, as I recall I last saw this discussed in a proposal about `inline` data (where they failed to consider plugging the hole). – Cheers and hth. - Alf Jun 18 '16 at 21:31
  • An alternative to the `inline` function is to use the templated constant trick. It relies on ODR exemption for static constants in template classes. And it requires the *machinery* for inline data to be in place, just not directly usable. – Cheers and hth. - Alf Jun 18 '16 at 21:34
  • @Cheersandhth.-Alf: I guess I'm most interested in what will practically work. I understand that `inline` produces discardable linker record, but I guess I don't know what that record consists of. It always contains the string literal entirely? It can't be that the string literal gets converted to a pointer at some earlier stage? Or is that a non-portable assumption. – Chris Beck Jun 18 '16 at 21:34
  • You can have any number of the same string literal with the same contents. However, it's unspecified whether those literals will end up being the same instance. So one instance of `"foo"` may or may not have the same address as another instance of `"foo"`. – Michael Burr Jun 18 '16 at 21:48
  • 2
    `"foo"` is not a name. Not all lvalues have names. – T.C. Jun 18 '16 at 22:40
  • "*It is a common problem that string constants cannot usually be defined in header files without causing linkage problems.*" Citation needed. – ildjarn Jun 19 '16 at 00:32
  • @ildjarn: I guess I meant only that, beginners commonly mess it up. I'm sure you don't need a citation for that :p I might rewrite that sentence. – Chris Beck Jun 19 '16 at 01:32

1 Answers1

8

Your last argument is nonsense. "foo" isn't even grammatically a name, but a string-literal. And string literals being lvalues and some lvalues having names does not imply that string literals are or have names. String literals as used in your code do not violate the ODR.

It was actually, until C++11, mandated that string literals in multiple definitions of inline functions across TUs designate the same entity, but that superfluous and mostly unimplemented rule was removed by CWG 1823.

Because, the name "foo" might not correspond to the same object in the two compilation units -- it's potentially a "different" string literal in each, no?

Correct, but that's irrelevant. Because the ODR does not care about specific argument values. If you did manage to somehow get a different e.g. function template specialization to be called in both TUs, that would be problematic, but fortunately string literals are invalid template arguments, so you're gonna have to be clever.

Columbo
  • 60,038
  • 8
  • 155
  • 203
  • I guess the most salient feature of being a name, is that name resolution happens. Clearly, namespaces, hiding, and such, don't apply to `"foo"`, so that should have been a big hint to me that it is not a name. – Chris Beck Jun 20 '16 at 17:34
  • IIUC, string-literal is (re-)evaluated every time control passes through it, just like other literals except that the result must be lvalue with static storage duration. That means the standard permits that `auto s1=A::get(), s2=A::get();` can give different address as long as they point to lvalues with static storage duration. Am I right? – VainMan Apr 08 '22 at 22:42
  • No, that's not right, because `A::get` is a specific function that's compiled once and will return the same address every time. – Columbo Apr 13 '22 at 06:52