2

I got this homework to decide what the following code will do (on paper, without testing on computer).

char s1[]="Short Message Service", *s2, *s3;
s2=strchr(s1,'M');
s3=strchr(s2,'S');
strncpy(s1+1,s2,1);
strcpy(s1+2,s3);

When I wanted to check if I did it right, I ran it on computer and got this result:

s1 = SMservice 

s2 = ice

s3 = Service

I thought that s2 will be "Message Service" but it changes to "ice". Apparently it changes after strcpy(s1+2,s3) is called; can somebody explain why and how that function affects s2?

Roberto Caboni
  • 7,252
  • 10
  • 25
  • 39
Blu Dog
  • 31
  • 5

2 Answers2

7

The answer is "undefined behaviour" — anything can happen. The arguments to strcpy() and strncpy() must not overlap. — yet here, the arguments to strcpy() do overlap.

C11 §7.24.2.3 The strcpy function ¶2:

The strcpy function copies the string pointed to by s2 (including the terminating null character) into the array pointed to by s1. If copying takes place between objects that overlap, the behavior is undefined.

§7.24.2.4 The strncpy function ¶2

The strncpy function copies not more than n characters (characters that follow a null character are not copied) from the array pointed to by s2 to the array pointed to by s1.308) If copying takes place between objects that overlap, the behavior is undefined.

308) Thus, if there is no null character in the first n characters of the array pointed to by s2, the result will not be null-terminated.

That means there is no reliable answer that can be given. You might decide that you'd then describe what would happen if the copy operations copy from the start of the source over the destination, which is what your instructor likely expects. But that is not guaranteed behaviour.

Given the following code and the left-to-right copying assumption:

char s1[] = "Short Message Service";
char *s2 = strchr(s1, 'M');
char *s3 = strrchr(s2, 'S');
strncpy(s1+1, s2, 1);
strcpy(s1+2, s3);

We can deduce that s2 points to &s1[6] and s3 points to &s1[14] (and this is mandatory). The values in s1 at various stages are:

s1 = "Short Message Service"   -- initial text
s1 = "SMort Message Service"   -- after strncpy
s1 = "SMService"               -- after strcpy (but this assumes UB works as expected)

So the string starting at s2 now contains ice, as you found.

However, it must be re-emphasized, this is not required behaviour.

Community
  • 1
  • 1
Jonathan Leffler
  • 730,956
  • 141
  • 904
  • 1,278
  • It should be `s1 = "SM" -- after strncpy` – isrnick May 14 '20 at 15:02
  • 2
    @isrnick — nope; `strncpy()` does not null terminate when the source string is longer than the length. That's the content of footnote 308 which I didn't originally include in the answer (but I've just added it). The `strncpy()` operation copies precisely one character, the `'M'`. – Jonathan Leffler May 14 '20 at 15:04
  • Nice answer. Not only I was late with the "regular" answer, but I was too slow also with the alternative one. – Roberto Caboni May 14 '20 at 15:32
  • 1
    @RobertoCaboni — Thanks! My answer was a bit of a 'chameleon answer' in that it grew over time. I added the "but you'll probably need to tell your instructor what they expect to hear" section just a few seconds after you'd posted your answer — so while you were creating your answer with the careful ASCII-art diagrams (which take time to create), I was posting my simpler version of the same analysis. But it's good we agree on the probable result. I'm debating whether to add my example program and its output. – Jonathan Leffler May 14 '20 at 15:39
2

The other answers already told you the bitter truth: copying overlapping strings with strncpy and strcpy is undefined behaviour, and shall be avoided, especially when it comes to more complex formats (and it is also true for functions such as sprintf).


Anyway what you see can be explained analyzing your code step by step. I want to underline again that when there's undefined behaviour any compiler could choose to behave differently, so we cannot be sure that it is an universal explanation.

The important thing to take into account is that all the pointers share the same memory locations. After the initialization of s1

char s1[]="Short Message Service", *s2, *s3;

the char array pointed by it looks like this:

----------------------------------------------
|S|h|o|r|t| |M|e|s|s|a|g|e| |S|e|r|v|i|c|e|\0|
----------------------------------------------
 ^
 s1

Then you set s2 and s3 at the beginning of the second and the third word:

s2=strchr(s1,'M');
s3=strrchr(s2,'S');

Here how the three pointers are located

----------------------------------------------
|S|h|o|r|t| |M|e|s|s|a|g|e| |S|e|r|v|i|c|e|\0|
----------------------------------------------
 ^           ^               ^
 s1          s2              s3

Since every string is actually the array from the respective pointer to the first terminator, if you print the three strings you see:

s1: "Short Message Service"
s2: "Message Service"
s3: "Service"

Then you copy just one character of s2 after the first character of s1:

strncpy(s1+1,s2,1);

Please note that when the source string is longer than the maximum length passed to strncpy the string terminator is not copied. The array will look like this:

----------------------------------------------
|S|M|o|r|t| |M|e|s|s|a|g|e| |S|e|r|v|i|c|e|\0|
----------------------------------------------
 ^           ^               ^
 s1          s2              s3

Not much would change printing the strings: s1 just became "Short Message Service". Finally you use

strcpy(s1+2,s3);

-----------------------------------------------
|S|M|S|e|r|v|i|c|e|\0|a|g|e| |S|e|r|v|i|c|e|\0|
-----------------------------------------------
 ^           ^                ^
 s1          s2               s3

That's why you get

Since every string is actually the array from the respective pointer to the first terminator, if you print the three strings you see:

s1: "SMService"
s2: "ice"       // Because of the terminator in the middle
s3: "Service"   // The original string ending

Just a practical suggestion

If you need a pointer to every word you just need to store the beginning of the word, as you already do, and then to place a string terminator in the position of each space.

In this way, s1 will be "Short" (because the terminator will be found where the first space was), s2 will be "Message" (because the terminator will be found where the second space was) and s3 will be "Service" (because of the original terminator).

By the way: that's what strtok does: finding the occurrence of a token, placing a string terminator in it and returning the pointer past it.

Roberto Caboni
  • 7,252
  • 10
  • 25
  • 39