8

My understanding is as follows:

  • char * points to a string constant, modifying the data it points to is undefined. You can, however, change where it points to.

  • char[] refers to a block of memory that you can change. You can change its contents but not what it refers to.

  • strcpy(dest, src) copies src into dest.

My question is, is it incorrect to use strcpy() with the dest being a char * that is already pointing to something (as I believe the old contents will be overwritten by strcpy() - which is undefined behaviour)?

For example:

char *dest = malloc(5);
dest = "FIVE";

char *src = malloc(5);
src = "NEW!";

strcpy(dest, src); /* Invalid because chars at dest are getting overwritten? */
fuzzy
  • 97
  • 6
CS Student
  • 1,613
  • 6
  • 24
  • 40
  • 5
    `char * points to a string constant` - Nope. A `char *`, appropriately set, points to a **`char`**. Whether it is sequence of `char` appropraitely nulchar-terminated is an artifact of the data it addresses. And it isn't constant. The general description of an *array* (in your list: `char[]`) is closer to reality. A pointer *holds* an address; an array *is* an address. – WhozCraig Aug 26 '14 at 10:00
  • 2
    First, char * is a pointer to a char. In C, strings are sequences of char terminated by a zero char, so generally char * points to the start of such a string. But it can also point to the start of a buffer meant to accept such a string. If the string is not a literal, it can be modified. – Rudy Velthuis Aug 26 '14 at 10:02
  • 2
    @WhozCraig "arrays are address" - uh, arrays *have* addresses – M.M Aug 26 '14 at 10:04
  • @MattMcNabb so do pointers: `char *p; char **pp = &p;` One of them (pointers) holds an address that can be modified. One of them (arrays) *are* an address that cannot. My grammar was horrid wen I wrote that, and I reworded it, but it is still inline with this comment. – WhozCraig Aug 26 '14 at 10:05
  • 3
    Arrays are not an address; they are an object that consists of a non-zero number of elements – M.M Aug 26 '14 at 10:06
  • 1
    @user93353 I know my understanding is poor, I am very much a beginner. Learning C as my first language is tough. – CS Student Aug 26 '14 at 10:06
  • @MattMcNabb **where**? – WhozCraig Aug 26 '14 at 10:06
  • @WhozCraig where what? somewhere in memory. – M.M Aug 26 '14 at 10:07
  • @MattMcNabb yeah, somewhere in memory; an *address*. I stand by that comment. – WhozCraig Aug 26 '14 at 10:09
  • @WhozCraig K&R states this: `char *pmessage = "now is the time";` with the comment "pmessage is a pointer, initialized to point to a string constant; the pointer may subsequently be modified to point elsewhere, but the result is undefined if you try to modify the string contents." Thats where I got the idea of a constant. What have I misunderstood? – CS Student Aug 26 '14 at 10:09
  • 1
    @WhozCraig The array is *stored at* an address. Not *is* an address. That's like saying a person is a house. – M.M Aug 26 '14 at 10:10
  • @WhozCraig, I think that just because arrays decay to addresses in practically every operation on them, doesn't make them addresses. `sizeof(char[100])` is much more than the size needed to store an address after all. – StoryTeller - Unslander Monica Aug 26 '14 at 10:10
  • @CSStudent K&R is correct, and a decent compiler will warn you of said assignment now, pedantically so if it isn't `char const *pmessage = ...` – WhozCraig Aug 26 '14 at 10:11
  • 1
    @MattMcNabb I understand, believe me. The type is certainly not an address-type (expression usage not withstanding). I concur on that, and StoryTeller, I agree. Its the *use* that drives my claim. If you could pony up a situation where an array is used *without* its address I'd me most interested. Somewhat akin to a tree falling in the forest with no one there to hear it (as long as we're tossing bad analogies around, and on that, a house is a pointer =P). – WhozCraig Aug 26 '14 at 10:17
  • @WhozCraig I'm sure you understand arrays which is why your choice of words is even more puzzling. All variables have addresses, an array is nothing special in that regard. And in English, "A is B" is generally not equivalent to "A has B" ! I'm not clear what you are asking for with "a situation where an array is used without its address". Whatever the situation , if you consider the variable to be "used with its address" (whatever that means), taking an `int` instead of an array must also be "used with its address". – M.M Aug 26 '14 at 10:23
  • @WhozCraig It would be better to stick to the official wording: an array *decays* into the address of its first element under certain circumstances (i. e., except on using `&`, `sizeof` and a handful of other things). That means, the name of the array `a` is equivalent to `&a[0]`. – glglgl Aug 26 '14 at 10:29
  • @MattMcNabb Certainly (I think). On that, `int a;` has an address, but is not, expressed as a *value*, such. Like an array, it isn't "going" anywhere. You can't "move" it. You can change its value just as a pointer can have its address changed. I'm quite certain you understand my perspective, and perhaps even acknowledge its merit, misplaced as it may-well be. I admit its an odd perspective, but no more so than the smattering of "decay" tossed about in C forums, a word that appears exactly *nowhere* in the standard, yet referred to like gospel. – WhozCraig Aug 26 '14 at 10:30
  • @glglgl If we're sticking to the "official" wording, then remove "decays" from the vernacular, as the C standard make no mention of it *at all* (at least through C99). – WhozCraig Aug 26 '14 at 10:32
  • @WhozCraig I don't understand your perspective. Arrays are equally "movable" as ints (i.e. neither is). "decay" is a handy shortcut for the syntax rule about using an array's identifier in an expression (which has nothing at all to do with the semantics of an array). – M.M Aug 26 '14 at 10:33
  • @MattMcNabb that specific word I simply loathe. It is a *verb* that implies a functional transformation from A to B (something that isn't A) at runtime which is *not* what happens. Interesting history. I asked once (on this forum) if anyone could find reference to its first emergence since it wasn't standard-based. After much searching the earliest reference was somewhere in a newsgroup post circa 1988. I've searched in vain to find an earlier reference. A *lot* of people hunted on that one. – WhozCraig Aug 26 '14 at 10:36
  • 1
    @MattMcNabb you're absolutely right, it isn't runtime at all. I don't see the conversion examples cited as synonymous at all. One can easily introduce value inequity (well, going from `long` to `int`, anyway). The other won't. I can agree to disagree. I'm sure people see it either way, and will even admit to the likelihood of being in the minority. Maybe its the rose tint in my glasses. Maybe it just makes it easier to explain what is happening under the covers (at least for/to me). Its nearly 4:00am here so I'm hitting it, but thanks for your time, Matt. Truly appreciate it. – WhozCraig Aug 26 '14 at 10:47
  • 1
    The C standard text is "an expression that has type ‘‘array of type’’ is converted to an expression with type ‘‘pointer to type’’". It is a conversion (an implicit one). Whether it occurs at runtime or not is not specified by the standard (which does not define "runtime" or directly distinguish runtime from compile-time). – M.M Aug 26 '14 at 10:47
  • @WhozCraig I reworded my comment to refer directly to the standard description of this process. 'Night – M.M Aug 26 '14 at 10:48
  • 1
    There is value inequity on the array-to-pointer conversion, in that the value of a pointer is different to the value of an array. Actually the Standard does not define "value of an array" either. I have seen people say "The value of an array is its address", however that is also wrong (see the definition of "value" in section 3, it excludes this possibility). The meaning is that *an attempt to access the value of an array actually specifies a conversion from array to pointer* . This is specified by 6.3.2.1/3 . – M.M Aug 26 '14 at 10:51

4 Answers4

17

Your understanding is not totally correct, unfortunately.

char * points at character data, and since there's no const in there, you can write to the data being pointed to.

However, it's perfectly possible to do this:

char *a = "hello";

which gives you a read/write pointer to read-only data, since string literals are stored in read-only memory, but not "considered" constant by the language's syntax.

It's better to write the above as:

const char *a = "hello";

To make it more clear that you cannot modify the data pointed at by a.

Also, your examples mixing malloc() and assignment are wrong.

This:

char *dest = malloc(5);
dest = "FIVE"; /* BAD CODE */

Is bad code, and you should never do that. It simply overwrites the pointer returned by dest with a pointer to the string "FIVE" which exists somewhere in (again, read-only) memory as a string literal.

The proper way to initalize newly allocated memory with string data is to use strcpy():

char *dest = malloc(5);
if(dest != NULL)
  strcpy(dest, "five");

Note that checking the return value of malloc() is a good idea.

There's no problem doing multiple writes to the same memory, that's a very basic idea in C; variables represent memory, and can be given different values at different times by being "written over".

Something as simple as:

int a = 2;

printf("a=%d\n", a);
a = 4;
printf("a=%d\n", a);

demonstrates this, and it works just fine for strings too of course since they are just blocks of memory.

You can extend the above malloc()-based example:

char *dest = malloc(5);
if(dest != NULL)
{
  strcpy(dest, "five");
  printf("dest='%s'\n", dest);
  strcpy(dest, "four");
  printf("dest='%s'\n", dest);
  strcpy(dest, "one");
  printf("dest='%s'\n", dest);
}

and it will print:

dest='five'
dest='four'
dest='one'
unwind
  • 391,730
  • 64
  • 469
  • 606
4

My understanding is as follows:

  • char * points to a string constant, modifying the data it points to is undefined. You can however change where it points to.

Here you refer to an expression like

char * string = "mystring";

You are right that doing string[1]='r'; is undefined. But that is not because of the char *, but because of the string literal involved in a way that it is put into read-only memory.

Compare this to

char string[] = "mystring";

where I define an array in RAM where the said string is put into. Here it is allowed to do string[1] = 'r';, because we are in normal data memory.

This seems to support your assumption, but take this:

char string[] = "mystring";
char * string2 = string;

Here string2[1] = 'r'; is valid, because it points to a location where writing is ok as well.

char[] refers to a block of memory that you can change its contents but not what it refers to.

Yes, because there the name is just the name of a variable and not a pointer.

strcpy(dest, src) copies src into dest.

Right.

My question is, is it incorrect to use strcpy() with the dest being a char * that is already pointing to something (as I beleive the old contents will be overwritten by strcpy() - which is undefined behaviour)?

It depends what you mean with "already pointing to something"...

For example:

char *dest = malloc(5);
dest = "FIVE";

char *src = malloc(5);
src = "NEW!";

strcpy(dest, src); /* Invalid because chars at dest are getting

overwritten? */

Here you again mix up several things.

First, you have dest point to a brand new chunk of memory. Afterwards, you have it point to somewhere else where you cannot write, and the chunk of memory is lost (memory leak).

The same happens with src.

So the strcpy() fails.

You can do

char *dest = malloc(5);

char *src = "NEW!";

strcpy(dest, src);

as here dest points to a writable place, and src points to useful data.

glglgl
  • 89,107
  • 13
  • 149
  • 217
  • Thanks for the detailed answer. What I meant by "is already pointing to something" is that if I removed the `mallocs` and just had `char *dest = "FIVE"` and `char *src = "NEW!"` then I used `strcpy()`, would that be legal or is `strcpy()` overwriting the string that `dest` points to? – CS Student Aug 26 '14 at 10:30
  • 1
    "or is `strcpy()` overwriting the string that dest points to" Yes, that's what it is meant to do. It gets an address, tries to write to it, but when it is not allowed to write there, it leads to undefined behaviour (UB). – glglgl Aug 26 '14 at 10:32
  • So if I had `dest` pointing to a new chunk of memory from `malloc` I could use `strcpy()` without UB? – CS Student Aug 26 '14 at 10:36
  • @CSStudent Yes, that's what it is made for. But you can do as well `char dest[10]; strcpy(dest, "DATA");`. – glglgl Aug 26 '14 at 10:40
2

A quick analysis:

char *dest = malloc(5);
// 'dest' is set to point to a piece of allocated memory
// (typically located in the heap)
dest = "FIVE";
// 'dest' is set to point to a constant string
// (typically located in the code-section or in the data-section)

You are assigning variable dest twice, so obviously, the first assignment has no meaning.

It's like writing:

int i = 5;
i = 6;

On top of that, you "lose" the address of the allocated memory, so you will not be able to release it later.

barak manos
  • 29,648
  • 10
  • 62
  • 114
1

char* is a pointer to a memory adress, so you CAN modify the information contained at that adress.

The difference between char* and char[] is that char[] is not dynamic, you can't change its size. Also, char * points to a adress at the heap while char[] is stored at the stack of your program.

You can use strcpy with both pointers and arrays and it will work since data from both can be overwritten.

Mognom
  • 214
  • 1
  • 3
  • "char[] is not dynamic", not entirely true. You can use VLA, or the array is part of struct which is dynamically allocated. It'd be best if you phrased your assertions more precisely instead of just giving blanket statement about "char[]". That or use code samples to clarify. – StoryTeller - Unslander Monica Aug 26 '14 at 10:07