52

I've recently been learning C++ and have realised that string literals in C++ have to be constants, whereas in C, they do not. Here is an example. The following code would be valid in C, but not in C++:

char* str = "Hello, World!";

In order to do the same thing in C++, the following statement has to be used:

const char* str = "Hello, World!";

Why is there a difference?

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Serket
  • 3,785
  • 3
  • 14
  • 45
  • 26
    *" whereas in C, they do not"* Your are incorrect. String literals in C are constant, but they are allowed to be pointed to by a `char*`. You still can't modify the string through that pointer. In C++, they just eliminated that exception to const correctness to avoid confusion and mistakes. – François Andrieux May 04 '20 at 21:26
  • 5
    Because that is how they designed the language. C didn't initially have the `const` keyword, so it would break legacy code if they changed literals to require `const`-qualification after introduction of the keyword. C's string-literals are immutable, though, so changing the contents is undefined behavior even if it's not `const`-qualified. – Christian Gibbons May 04 '20 at 21:26
  • @ChristianGibbons then why does the following code work? ``char* str = "Hello World"; str = "Goodbye World";``` – Serket May 04 '20 at 21:28
  • 13
    @Serket: That's not modifying the string itself; it's changing the pointer `str` (which is not a constant) to point to a different string. `char *str = "Hello World"; str[0] = 'J';` would be undefined behavior. – Nate Eldredge May 04 '20 at 21:29
  • 1
    @NateEldredge Thanks! Could someone post an official answer for me to mark as correct? – Serket May 04 '20 at 21:30
  • 1
    @FrançoisAndrieux you should probably copy that into an answer. – Marco Bonelli May 04 '20 at 21:30
  • @FrançoisAndrieux C string literals are not necessarily constant. Attempting to modify a string literal (more precisely, the anonymous array object that a string literal refers to) has undefined behavior. A conforming C implementation could make that array modifiable. – Keith Thompson May 04 '20 at 21:53
  • @KeithThompson You can say that for pretty much any example of undefined behavior. It's constant as far as the standard is concerned. – eesiraed May 04 '20 at 22:29
  • @BessieTheCow: For something to be constant as far as the standard is concerned, attempting to modify it would be a constraint or syntax violation. For example, `const int *ptr = ...; *ptr = 42;` must be diagnosed. `char *s = "hello"; *s = 'H';` does not require a diagnostic. The string literal is of type `char[6]`, not `const char[6]` (as it is in C++). (Note also that "constant" and "`const`" are not the same thing. "Constant" means the value is determined at compile time. `const` means read-only. C++ string literals are `const`. – Keith Thompson May 04 '20 at 23:33
  • @KeithThompson What about `const int x = 5; *(int*)&x = 42;` or `constexpr int x = 5; *(int*)&x = 42;`? That's UB no diagnostic required as far as I can tell. – eesiraed May 04 '20 at 23:50
  • @BessieTheCow Sure, you can always cast away `const` and avoid a required diagnostic. The point is that string literals aren't `const` in the first place.There are three separate rules at play: (a) Modifying an object via a `const` qualified lvalue is a constraint violation. (b) Modifying an object defined with `const` has UB (c) Modifying a string literal has UB. – Keith Thompson May 05 '20 at 00:51
  • @BessieTheCow The only reason modification of string literals is treated separately is to avoid breaking existing code. Making string literals `const` would have made `char *s = "hello";` a constraint violation. As it is, it's perfectly valid (but unwise) and invokes UB only if you try to modify the object `s` points to. – Keith Thompson May 05 '20 at 00:53

2 Answers2

44

Expanding on Christian Gibbons' answer a bit...

In C, string literals, like "Hello, World!", are stored in arrays of char such that they are visible over the lifetime of the program. String literals are supposed to be immutable, and some implementations will store them in a read-only memory segment (such that attempting to modify the literal's contents will trigger a runtime error). Some implementations don't, and attempting to modify the literal's contents may not trigger a runtime error (it may even appear to work as intended). The C language definition leaves the behavior "undefined" so that the compiler is free to handle the situation however it sees fit.

In C++, string literals are stored in arrays of const char, so that any attempt to modify the literal's contents will trigger a diagnostic at compile time.

As Christian points out, the const keyword was not originally a part of C. It was, however, originally part of C++, and it makes using string literals a little safer.

Remember that the const keyword does not mean "store this in read-only memory", it only means "this thing may not be the target of an assignment."

Also remember that, unless it is the operand of the sizeof or unary * operators, or is a string literal used to initialize a character array in a declaration, an expression of type "N-element array of T" will be converted ("decay") to an expression of type "pointer to T" and the value of the expression will be the address of the first element of the array.

In C++, when you write

const char *str = "Hello, world";

the address of the first character of the string is stored to str. You can set str to point to a different string literal:

str = "Goodbye cruel world";

but what you cannot do is modify the contents of the string, something like

str[0] = 'h';

or

strcpy( str, "Something else" );
Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
John Bode
  • 119,563
  • 19
  • 122
  • 198
  • 1
    IMO a correct answer. I'd probably stress out more that in `const char *s="foo";` the `const` is telling something only about the pointer (cannot be used for writing), but tells nothing about the pointed-to object. Unfortunately a lot of C++ programmers (and even the standard library) confuses the two especially with references (i.e. in `const X& x` the keyword `const` is relative to the **reference** only and tells nothing about the referenced object; passing a `const&` is **not** a smart way to pass by value as you can run into lifetime or aliasing problems). – 6502 May 06 '20 at 06:08
  • 1
    @6502: `const char *str` means that the pointed-to object is `const`, not the pointer. We can write a new value to `str` (point it to a different object), but not to `*str` or `str[i]`. `char * const str` means that the pointer itself is `const` and cannot be written to, but the thing it points to can be. – John Bode May 06 '20 at 10:58
  • No. The declaration `const char *str;` is a declaration of a "read only pointer" , i.e. a pointer that can be used for reading only... but that is a property of the pointer, it tells nothing about const-ness of pointed object. The pointed object can change, simply it cannot be changed **using that pointer** (but there can be for example other read/write pointers to the same object). A `const char *` is not a pointer to char that is const, it's a pointer to a char that cannot be used for writing. – 6502 May 06 '20 at 12:56
  • @6502: =*sigh*= You're right - I hadn't had my coffee yet. – John Bode May 08 '20 at 14:05
  • @6502 I'm confused now. In [this post about the clockwise/spiral rule](http://c-faq.com/decl/spiral.anderson.html), David Anderson states that `const char *chptr;` means *chptr is a pointer to a char constant*. Is he wrong? Or is it just a simplification and one has to add that the pointed-to object is only constant from the viewpoint of the current pointer? – Splines Jan 07 '22 at 19:54
  • 1
    @Splines: code like `char x = 'A'; const char *p = &x; printf("%c\n", *p); x = 'B'; printf("%c\n", *p);` is perfectly valid and the output must be `A\nB\n` (see https://godbolt.org/z/rjx64TjMG). So `p` is not a pointer to a constant because what is pointed to changes! Sure you cannot change it using `p` (without a cast) but this doesn't make the pointed object a constant in the absolute sense. `p` is a pointer that cannot be used to write (that's why I would prefer "readonly pointer"). Note however that **string literals** ARE constants, thus trying to modify them is UB. – 6502 Jan 07 '22 at 20:53
  • @6502 Thanks for this clarification and the example. Got it now. – Splines Jan 07 '22 at 20:59
27

C didn't initially have the const keyword, so it would break legacy code if they changed literals to require const-qualification after introduction of the keyword. C's string-literals are immutable, though, so changing the contents is undefined behavior even if it's not const-qualified.

C++, on the other hand, was designed with the const keyword. Initially, C++ did allow for string literals to be assigned to non const-qualified char *s presumably for compatibility with existing C code. As of the C++03 standard, however, they decided to deprecate this functionality rather than allowing the dissonance to continue into perpetuity. I would guess the amount of legacy C++ code relying on non-const qualified char *s pointing to string literals to be small enough that it was a worthy trade-off.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Christian Gibbons
  • 4,272
  • 1
  • 16
  • 29