2

As far as I know, a string literal can't be modified for example:

char* a = "abc";
a[0] = 'c';

That would not work since string literal is read-only. I can only modify it if:

char a[] = "abc";
a[0] = 'c';

However, in this post, Parse $PATH variable and save the directory names into an array of strings, the first answer modified a string literal at these two places:

path_var[j]='\0';
array[current_colon] = path_var+j+1;

I'm not very familiar with C so any explanation would be appreciated.

HDHDHD
  • 31
  • 4
  • 4
    `a[]` stores a copy of the literal. You're modifying the copy, not the literal itself. – HolyBlackCat Sep 12 '22 at 06:56
  • 3
    Neither `path_var` nor `array` point to string literals. – n. m. could be an AI Sep 12 '22 at 06:59
  • 1
    ...and the environment should not be modified... – linuxfan says Reinstate Monica Sep 12 '22 at 07:00
  • 1
    @linuxfansaysReinstateMonica What makes you say that? If you know the underlying implementation you *can* modify the environment if you're careful, and occasionally this makes sense (though in general you should go through the OS helper functions). – Konrad Rudolph Sep 12 '22 at 07:15
  • 1
    @KonradRudolph I was referring to the post the OP cited as an example. A comment says "getenv() returns a pointer to a string ... The caller must take care not to modify this string.". And I think it is a good idea, especially in multi-threading situations. Anyway it's quite out-of-topic. – linuxfan says Reinstate Monica Sep 12 '22 at 07:38
  • 3
    7.20.4.5.4 in the standard says of the string returned by getenv: "The string pointed to shall not be modified by the program", so just don't do it. – Paul Hankin Sep 12 '22 at 08:29
  • 2
    As written, I don't think this question is clear enough. The crux of the question is whether some piece of code in another post (the upvoted answer?) modifies a string literal, but it would be better if a minimal reproduction was included here, especially as it's not clear exactly what string literal is being modified (if any). – Paul Hankin Sep 12 '22 at 08:31
  • Modifying a string literal is undefined behavior, but some old, skunky systems did allow this. That's the smelly reason why string literals are still of type `char[]` in C and not `const char[]` as in C++. It's more important to the C committee to preserve backwards compatibility to some old rotten systems from the mid 80's (that were likely replaced long time ago) than to fix design errors in the C language. – Lundin Sep 12 '22 at 09:27
  • It's not about preserving backwards operational compatibility with old implementations of whatever quality. It's about preserving backwards *source* compatibility with sources predating the introduction of `const`, which was not always part of the language. But yes, the committee has historically considered backwards compatibility important. – John Bollinger Sep 12 '22 at 13:54

4 Answers4

2

In programming, there are quite a few rules that are up to you to follow, even though they are not — necessarily — enforced. And "String literals in C are not modifiable" is one of those. So is "Strings returned by getenv should not be modified".

There are some real-world analogies that apply. Here's one: If you're at an intersection, and the light is red, you're not supposed to cross. But, much of the time, if you break the rule, and cross, you might get away with it. You might get a ticket from a policeman — or you might not. You might cause a crash — or you might not. But if you get lucky, and neither of these things happens, that does not imply that crossing the intersection against the red light was okay — it's still quite true that it was very much against the rules.

Similarly, in C, if you write some code that modifies a string literal, or a string returned from getenv, you might get away with it. The compiler might give you a warning or error message — or it might not. Your program might crash — or it might not. But if the program seems to work, that does not imply that these strings are actually modifiable — they're not.

Steve Summit
  • 45,437
  • 7
  • 70
  • 103
1

Code blocks from the post you linked:

const char *orig_path_var = getenv("PATH"); 
char *path_var = strdup(orig_path_var ? orig_path_var : "");
const char **array;
array = malloc((nb_colons+1) * sizeof(*array));
array[0] = path_var;
array[current_colon] = path_var+j+1;

First block:

  • In the 1st line getenv() returns a pointer to a string which is pointed to by orig_path_var. The string that get_env() returns should be treated as a read-only string as the behaviour is undefined if the program attempts to modify it.
  • In the 2nd line strdup() is called to make a duplicate of this string. The way strdup() does this is by calling malloc() and allocating memory for the size of the string + 1 and then copying the string into the memory.
  • Since malloc() is used, the string is stored on the heap, this allows us to edit the string and modify it.

Second block:

  • In the 1st line we can see that array points to a an array of char * pointers. There is nb_colons+1 pointers in the array.
  • Then in the 2nd line the 0th element of array is initilized to path_var (remember it is not a string literal, but a copy of one).
  • In the 3rd line, the current_colonth element of array is set to path_var+j+1. If you don't understand pointer arithmetic, this just means it assigns the address of the j+1th char of path_var to array[current_colon].

As you can see, the code is not operating on const string literals like orig_path_var. Instead it uses a copy made with strdup(). This seems to be where your confusion stems from so take a look at this:

char *strdup(const char *s);

The strdup() function returns a pointer to a new string which is a duplicate of the string s. Memory for the new string is obtained with malloc(3), and can be freed with free(3).

The above text shows what strdup() does according to its man page.

It may also help to read the malloc() man page.

programmer
  • 669
  • 3
  • 11
  • Re “`getenv()` returns a pointer to a string literal”: A string literally is (formally) text in source code such as `"abc"` or (informally) the array created for such source code. `getenv` returns a pointer to (the first character of) a string. It is not specified to be a string literal, and it is unlikely to be such. The “literal” part of “string literal” indicates its value is actually in its letters; the string `"abc"` contains “a”, “b”, and “c” in its value. There is nothing in `getenv` that indicates the value literally, so “literal” is inappropriate for it. – Eric Postpischil Sep 12 '22 at 12:55
  • The string returned by `getenv` could be described as a constant in some sense, as the behavior is undefined if the program attempts to modify it (except that a subsequent call to `getenv` may do so). If `getenv` were created today, it would be declared to return a `const char *` instead of a `char *`, but `const` or constant are different properties than being literal. – Eric Postpischil Sep 12 '22 at 12:58
1

In the example

char* a = "abc";

the token "abc" produces a literal object in the program image, and denotes an expression which yields that object's address.

In the example

char a[] = "abc";

The token "abc" is serves as an array initializer, and doesn't denote a literal object. It is equivalent to:

char a[] = { 'a', 'b', 'c', 0 };

The individual character values of "abc" are literal data is recorded somewhere and somehow in the program image, but they are not accessible as a string literal object.

The array a isn't a literal, needless to say. Modifying a doesn't constitute modifying a literal, because it isn't one.

Regarding the remark:

That would not work since string literal is read-only.

That isn't accurate. The ISO C standard (no version of it to date) doesn't specify any requirements for what happens if a program tries to modify a string literal. It is undefined behavior. If your implementation stops the program with some diagnostic message, that's because of undefined behavior, not because it is required.

C implementations are not required to support string literal modification, which has the benefits like:

  • standard-conforming C programs can be translated into images that can be be burned into ROM chips, such that their string literals are accessed directly from that ROM image without having to be copied into RAM on start-up.

  • compilers can condense the storage for string literals by taking advantage of situations when one literal is a suffix of another. The expression "string" + 2 == "ring" can yield true. Since a strictly conforming program will not do something like "ring"[0] = 'w', due to that being undefined behavior, such a program will thereby avoid falling victim to the surprise of "string" unexpectedly turning into "stwing".

Kaz
  • 55,781
  • 9
  • 100
  • 149
0

There are several reasons for which you had better not to modify them:

  • The first is that the operating system and/or the compiler can enforce the non-writable property of string literals, putting them in read-only memory (e.g. ROM) or in the .text segment.
  • second, the compiler is allowed to merge string literals together, so if you modify (and do it successfully) you can get surprises later because other literals (that have been merged because e.g. one of them is a suffix of the other) change apparently by no reason.
  • if you need an initialized string that is modifiable, you can do it by allocating an array with a declaration, as in (which you can freely modify):
char array[100] = "abc"; // initialized to { 'a' ,'b', 'c', '\0',
                         //         /* and 96 more '\0' characters */
                         // };
Luis Colorado
  • 10,974
  • 1
  • 16
  • 31