1

I was reading up on common C pitfalls and came up to this article on some famous Uni website. (It is the 2nd link that comes up on google).

The last example on that page is,

// Memory allocation on the stack
void b(char **p) {
    char * str="print this string";
    *p = str;
}

int main(void) {
    char * s;
    b(&s);
    s[0]='j'; //crash, since the memory for str is allocated on the stack, 
              //and the call to b has already returned, the memory pointed to by str 
              //is no longer valid.
    return 0;
}

That explanation in the comment got me thinking then, that, isn't the memory for string literals not static?

Isn't the actual error there then that you are not supposed to modify string literals, because it is undefined behavior? Or are the comments there correct and my understanding of that example is wrong?

Upon searching further, I saw this question: referencing a char that went out of scope and I understood from that question that, the following is valid code.

#include <malloc.h>
char* a = NULL;
{
    char* b = "stackoverflow";
    a = b;
}

int main() {
    puts(a);
}

Also this question agrees with the other stackoverflow question and my thinking, but opposes the comment from that website's code.

To test it, I tried the following,

#include <stdio.h>
#include <malloc.h>

void b(char **p)
{
    char * str = "print this string";
    *p = str;
}

int main(void)
{
    char * s;
    b(&s);
    // s[0]='j'; //crash, since the memory for str is allocated on the stack,
                //and the call to b has already returned, the memory pointed to by str is no longer valid.
    printf("%s \n", s);
    return 0;
}

which as expected does not give a segmentation fault.

Deduplicator
  • 44,692
  • 7
  • 66
  • 118
Duck Dodgers
  • 3,409
  • 8
  • 29
  • 43

3 Answers3

3

Standard says (emphasize is mine):

6.4.5 String literals

  1. [...] The multibyte character sequence is then used to initialize an array of static storage duration and length just sufficient to contain the sequence. [...]

  2. [...] If the program attempts to modify such an array, the behavior is undefined. [...]

Jean-Baptiste Yunès
  • 34,548
  • 4
  • 48
  • 69
2

No, you misunderstand the reason for crash. String literals have static duration, meaning that they exist for the lifetime of the program. Since your pointer points to the literal, you can use it anytime.

The reason for the crash is the fact that string literals are read-only. In fact char* x = "" is an error in C++, as it should be const char* x = "". They are read-only from language perspective, and any attempt to modify them would lead to undefined behavior.

In practical terms, they are often put in the read-only segment, so any attempt at modification triggers a GPF - general protection fault. Usual response to GPF is a program termination - and this is what you are witnessing with your application.

SergeyA
  • 61,605
  • 5
  • 78
  • 137
  • @SergeyA _"Isn't the actual error there then that you are not supposed to modify string literals, because it is undefined behavior?"_ so you mean it is not undefined behavior? – Duck Dodgers Jan 18 '19 at 16:07
  • *so any attempt at modification triggers a GPF* this is also wrong. It is an UB. Not necessarrily GPF or hardware fault. – 0___________ Jan 18 '19 at 16:07
  • 1
    @It is the explanation of what happens to OP. I could add more wording to highlight the difference between UB and the observed symptoms. – SergeyA Jan 18 '19 at 16:08
  • 1
    @JoeyMallone UB is not an error. They are two different distinct things – 0___________ Jan 18 '19 at 16:08
  • You wrote: *any attempt* which is not the truth. For example using ARM uCs they will be placed in the FLASH memory and it will not trigger any event. It will just not write the new value. – 0___________ Jan 18 '19 at 16:09
  • on AVR uCs it will placed in the RAM and write will be successful – 0___________ Jan 18 '19 at 16:13
  • `Any attempt to modify them will lead to undefined behavior` - is the new wording. And it is correct statement. – SergeyA Jan 18 '19 at 16:13
  • @Sergey, for me it was strange that a uni website (and that too, not a bad uni) is teaching it wrong, but thanks for explaining about GPF. – Duck Dodgers Jan 18 '19 at 16:24
1

String literals are placed in general in rodata section (read-only) within the ELF file, and under Linux\Windows\Mac-OS they will end up in a memory region which will generate a fault when written to (configured so using MMU or MPU by the OS upon loading)

izac89
  • 3,790
  • 7
  • 30
  • 46
  • There is need if one wants to really understand the reason for seg fault, and is not satisfied with the formal, high-level UB explanation (which I'm obviously does not argue with). – izac89 Jan 18 '19 at 16:13
  • @Deduplicator agree, I'll add it to my answer. – izac89 Jan 18 '19 at 16:21
  • C does not specify nor require an ELF file. ELF, rodata, and OS are all implementation related details. Much compiled C has no OS, no read-only sections, etc. concerns. – chux - Reinstate Monica Jan 18 '19 at 18:12