4

Possible Duplicate:
Is a string literal in c++ created in static memory?

If I do:
const char* StringPtr = "string0",
then it is definitely somewhere in the memory, and I can get the address of StringPtr.

But if I do:
#define STRING0 "string0", then where does STRING0 reside?
Or, is STRING0 not existing in memory because compiler replace using of STRING0 by "string0"?

As far as I've known, whenever you write any string in your code, compiler must put it somewhere in the memory, but I don't know the exact behavior of it.
But I am not very sure about this.

Can anyone explain how strings that are #define-ed or declared as char* are manipulated by the compiler?

Also, which one is better? To #define, extern const char* or extern const std::stringin the header file for strings?

Thanks!

Community
  • 1
  • 1
Marson Mao
  • 2,935
  • 6
  • 30
  • 45
  • It stores no memory. Before the program is compiled, you can imagine it like this: The pre-proccesor will copy the value you gave *STRING0*, and paste it wherever you used it. – Name Dec 26 '12 at 04:30

5 Answers5

5

In almost all cases, the compiler is allowed to put a string literal wherever it wants. There might be one copy for each time the literal appears in source code, or one master copy shared among the instances.

This causes trouble sometimes in C where const doesn't mean the same thing and you are allowed to modify the memory. On one platform all the identical strings get changed, while on another changes don't propagate. As of C++11 string literals don't implicitly lose constness and the mistake is harder to make.

The strings will all be initialized before the program starts, so in effect they are part of the executable binary image. That much is certain.

What would be different is this:

const char StringPtr[] = "string0",

This defines a dedicated array object with a unique address.

Potatoswatter
  • 134,909
  • 25
  • 265
  • 421
  • Hi, is enabling the compiler option "Enable string pooling" to make only one master copy? And, if I declare `const char StringPtr[]`, does it reside in read-only memory as well as `const char*`? – Marson Mao Dec 26 '12 at 03:58
  • @MarsonMao Yes, that's the idea of that option. Yep, in theory that's in read-only memory, although of course the physical machine might not use literally read-only memory to store it. – Potatoswatter Dec 26 '12 at 04:00
  • Note that the `const char*` variable can also be changed to point at something else, unlike the array declared with `[]`. The pointer is kept in non-const memory, whereas the array has nothing kept non-const. – Potatoswatter Dec 26 '12 at 04:06
  • Thanks. So is there any size limit of read-only memory? Like if I decalred tons of `const char*` and tons of `const int`...etc. in the whole project, is it a severe problem? – Marson Mao Dec 26 '12 at 04:07
  • @MarsonMao That depends on the platform, but typically no. String data is the same as anything else, and the OS should be able to handle program files of hundreds of megabytes. – Potatoswatter Dec 26 '12 at 04:08
  • I see, so I'd better declare `const char* const ptr = "string0"` – Marson Mao Dec 26 '12 at 04:11
  • Ok, so the final question. Since I need to share the strings across many files, is it ok to `extern const char*` in header file, or is there any other better way to do this? – Marson Mao Dec 26 '12 at 04:16
  • @MarsonMao If they're constant, then that's the way to go. They must be defined in one source file, to which the header provides access. If they get modified, you need to use a non-const array using `[]`. – Potatoswatter Dec 26 '12 at 05:10
  • I see, very clear answer. I'm gonna use them as constant. Thank you very much! Lets swat the potatoessss lol!! – Marson Mao Dec 26 '12 at 06:06
1
#define STRING0

STRING0 does NOT reside in memory. It does NOT even exist during compilation. In PRE-compilation all occurances of STRING0 are replaced with "string0" by the preprocessor. After this stage, none of the following stages or the compiled applications know of the existance of any symbol of the name STRING0

Once this happens, many of not all instances will end up as unique string literals(your const char* case) all over your code. The answer to where these are stored in memory is better answered by @Potatoswatter and the link provided by @silico

Community
  • 1
  • 1
Karthik T
  • 31,456
  • 5
  • 68
  • 87
1

stringPtr resides in the executable's data section. If you open your exe in a text editor you will be able to search for it. Data Segment

The macro exists only for the duration of the preprocessing stage of building your program.

Depending on your compiler, if you use the macro method you can end up with several separate instances of an identical string in your exe, but if you use the char* method you can use just a single instance.

James
  • 9,064
  • 3
  • 31
  • 49
0

#define is a preprocessor macro. It will replace STRING0 with "string0" during the precompile stage before the code is then compiled.

"string0" resides in the executable's static read-only memory.

StringPtr is a variable, that is why you can take its address. It simply points at the memory address of "string0".

Remy Lebeau
  • 555,201
  • 31
  • 458
  • 770
  • so is `"string0"` pointed by `StringPtr` resides in executable's static read-only memory too? – Marson Mao Dec 26 '12 at 04:02
  • That is what I said: `"string0"` resides in the executable's static read-only memory. `StringPtr` merely points to the memory address of `"string0"`, but is not itself in static read-only memory. – Remy Lebeau Dec 26 '12 at 20:12
  • Here is an example where instances of "Hello" cannot be merged AND they ARE NOT PLACED IN THE READ-ONLY MEMORY. See variables t1 and t2: aszt.inf.elte.hu/~gsd/halado_cpp/ch01.html – user1284631 Apr 15 '13 at 12:31
-1

When you do the #define, there is not the compiler, but the preprocessor who replaces, textually, STRING0 with "string0" in the pre-processed source file, before passing it to the compiler proper.

The compiler never sees the STRING0, but only sees "string0" everywhere that you wrote STRING0.

edit:

Each instance of "string0" that replaces the STRING0 that you wrote in the source file is a string literals per se. If those string literals are guaranteed (or declared) as invariant, then the compiler might optimize memory allocation by storing a single copy of this "string0" and point other uses towards that copy (I rephrased this paragraph in edit).

(edit: those identical literal string constants might be merged into a singled copy, however this is is up to the compiler. THe standard does not require or enforce it: http://www.velocityreviews.com/forums/t946521-merging-of-string-literals-guaranteed-by-c-std.html )

As for your last question: the most portable is to declare those as: const char *

later edit: the best discussion about the string literals that I found so far is here: https://stackoverflow.com/a/2245983/1284631

Also, beware that a string literal can also appear in the initialization of statically-allocated char array, when it cannot be merged with other copies of it, since the content of the static array may be overwritten. See the example below, where the two identical string literals "hello" cannot be merged:

#include <stdio.h>
#include <string.h>

int main(){

        char x[50]="hello";

        printf("x=%s, &x[0]=%p\n",x,&x[0]);

        const char *y="hello";

        printf("y=%s, &y[0]=%p\n",y,&y[0]);

        strcpy(&x[0],"zz");

        printf("x=%s, &x[0]=%p\n",x,&x[0]);

        return 0;
}

The output of this code is:

x=hello, &x[0]=0x7fff8a964370
y=hello, &y[0]=0x400714
x=zz, &x[0]=0x7fff8a964370
Community
  • 1
  • 1
user1284631
  • 4,446
  • 36
  • 61
  • Some compilers have an option to merge duplicate strings together, so all instances of `"string0"`, regardless of usage, may get merged into a single instance in memory. – Remy Lebeau Dec 26 '12 at 03:55
  • @RemyLebeau: then, it should be used with care, as one could still modify the one of the string's instance in place, and inadvertently modify all the others. – user1284631 Dec 26 '12 at 03:56
  • I would expect all instances to be placed in the executable's static read-only memory, so any attempt to directly modify the data will crash the app. That has been my experience. – Remy Lebeau Dec 26 '12 at 04:01
  • @axeoth Since character literals are constant (it's only backcomp with c that you're allowed to write `char *foo = "this should be illegal"`), compilers can - and generally will - merge char literals. I'd be very surprised if some production compiler in release mode actually duplicated all the strings. – Voo Dec 26 '12 at 04:04
  • @Voo: it might be habitual, but I think it is not guaranteed that the dstrings are duplicated. No matter how "const" is that pointer declared, you can still alter the content of the string. – user1284631 Dec 26 '12 at 04:07
  • @axeoth Yes you can if you want undefined behavior. You can also dereference a null pointer - doesn't mean you should ever do it. – Voo Dec 26 '12 at 04:08
  • @RemyLebeau: there are several architectures and executable formats where there is no "read-only" part in memory, executable or not. – user1284631 Dec 26 '12 at 04:08
  • @voo: please, look here for a widely used dereferencing of a NULL pointer: http://open-nandra.com/2009/08/container_of-macro-or-how-it-works/ – user1284631 Dec 26 '12 at 04:09
  • @RemyLebeau: the standard does not guarantee merging of the string literals, see here: http://www.velocityreviews.com/forums/t946521-merging-of-string-literals-guaranteed-by-c-std.html Everything outside the standard cannot be assumed. However, I'll update the answer to point to that. – user1284631 Dec 26 '12 at 04:14
  • @axeoth Goodness now you're playing language spec lawyer? Ok fine: *Reading* the value of a dereferenced null pointer results in undefined behavior and should never be done (whether dereferencing a nullpointer results in undefined behavior already is not clear as I understand it, so who knows), which is the same as *changing* the value of a char literal. – Voo Dec 26 '12 at 04:15
  • @axeoth The point isn't that compilers are required to avoid duplicating literals (impossible to guarantee anyhow) but that doing so will never result in broken valid code, because changing the value of a string literal results in undefined behavior. – Voo Dec 26 '12 at 04:22
  • @voo: string literals may also appear in the initialization of statically-allocated arrays, which can be initialized through a #define-d STRING0. In the latter case, their content is part of the array and the const-ness has nothing to do with that. See the example that I added in my answer. – user1284631 Dec 26 '12 at 09:00
  • @RemyLebeau: I think not all instances of "string0", regardless of use, can be merged. Some "string0" may serve as simple initializers for statically-allocated char[] arrays, instead of {'s','t','r','i','n','g','0','\0'}. See the example that I put in my answer. – user1284631 Dec 26 '12 at 09:06
  • @RemyLebeau: here is an example where instances of "Hello" cannot be merged. See variables t1 and t2: http://aszt.inf.elte.hu/~gsd/halado_cpp/ch01.html – user1284631 Apr 15 '13 at 12:30