3

String is said to be a constant in C programming language.

So, when I give a statement like char *s = "Hello", I have learned that s points to a memory location of H since "Hello" is stored in some static memory of the program and also "Hello" is immutable.

Does it mean the variable s is now a variable of type pointer to constant data such as const int a = 3;const int *i = &a;. This seems so because I can't manipulate the data (when I do, it results in segmentation fault).

But, if it is so, shouldn't compiler be able to detect and say that I have assigned qualified data to unqualified variable. Something like char *p p is a pointer to unqualified character and when I say char *p="Hello" p, the pointer to unqualified character can't point to a const character type?

What am I missing here?

If it is not the case as above, then how is an array of constant characters made immutable?

Tarun Maganti
  • 3,076
  • 2
  • 35
  • 64
  • 1
    Have you looked inside the [C11](https://en.wikipedia.org/wiki/C11_(C_standard_revision)) standard document, [n1570](http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1570.pdf)? You really should download then read that specification. BTW ansi-c (and even C99) are obsolete standards. – Basile Starynkevitch Nov 22 '17 at 05:44
  • `variable s` is not const (unless stated), you can still use it to point to other locations. `"Hello"` is stored in `data section` part of your program which is why it is immutable. – qwn Nov 22 '17 at 05:46

2 Answers2

4

First of all, a string in C isn't immutable. C doesn't even know a type for strings -- a string is just defined as a sequence of char ending with '\0'.

What you're talking about are string literals and they can be immutable. The C standard defines that attempting to modify a string literal is undefined behavior, still their type is char *. So, if you are sure that in your implementation of C, a string literal is writable, you can do so! *)

But your code won't be well-defined C any more and won't work on other platforms with read-only string literals. It will compile, because writing through char * is perfectly fine, but fail at runtime in unpredictable ways (like, possibly, a crash).

Therefore, it's just best practice for portable code to assign string literals only to const char * pointers and, if you need a mutable string, use the string literal as an initializer for a char [].


*) beware this is very uncommon, you'll find it nowadays only with specialized compilers targeting embedded or very old platforms. A modern platform will place string literals in a read-only data segment or similar.

  • Even if you are sure that string literals are mutable in your implementation, mutating one is still undefined behaviour, so you can't do it. Similarly, you could be certain that your implementation doesn't trap on arithmetic overflow --indeed, GCC is documented as not trapping--, but the compiler can still produce surprising results. See https://blog.regehr.org/archives/759 – rici Nov 22 '17 at 07:20
  • Whether or not you believe it to be a "UB trick", the fact is that the compiler might constant fold the boolean `x < x + 1` to `1` even though it would evaluate to `0` for a particular value of `x`, because it would only evaluate to `0` in the case of UB. Similarly, if the compiler knows that the value of `char* p` is a pointer to a string literal, it could choose not to compile `*p = 'a';` it could even choose to not compile the following code in that basic block on the assumption that the programmer must have done something to guarantee that UB doesn't occur. – rici Nov 22 '17 at 07:37
  • Do you have a reference for the reality you refer to? That is, a compiler which makes string literals mutable? If not, it is all theoretical, no? If so, how do you know that the compiler will never acquire the sort of optimisation I'm talking about? When GCC acquired these optimizations, it certainly surprised the Linux authors, amongst other people, who were justifiably confident that their platform did not trap integer overflow. Anyway, that's it for me. – rici Nov 22 '17 at 07:48
  • @rici for example [cc65](http://cc65.github.io/doc/cc65.html) with command-line option `--writable-strings` gives you that guarantee. I still edited the answer to make the **warning** about relying on such things more distinct. But really, UB doesn't mean "you can't do that", it just means "your code isn't well-defined C, so it could break easily". –  Nov 22 '17 at 08:08
3

Syntax char *s = "Hello"; is present from days when const keyword was not part of C specs. Later it remained for reverse compatibility. Writing to such s[i] would lead to undefined behaviour. (Seg fault observed in your case for few runs)

This behaviour (Conversion from string literal or const char [] to non-constant char *) was supported in C++ briefly until C++11 and then deprecated.

Type safety in C is limited.

Mohit Jain
  • 30,259
  • 8
  • 73
  • 100
  • 1
    "briefly until C++11"? That would be 22 years, or so. What do you consider to be a long time? :) – rici Nov 22 '17 at 07:00