2

In C the strcpy function is used to copy a source into a destination string.

But when I use a destination char array of size 1 the strcpy correctly copies the source into the destination. But it also changes the source char array. I want to understand how this works in C.

I have done some research on how to correctly use strcpy in a program but all of them uses destination size more than 1. I did the program using destination size equal to 1. That's where the problem is.

char a[] = "String ABC";
char b[1];

strcpy(b, a);
int i;
// printf("%c\n", *(&(a[0])-1));

printf("%s\n",a);
printf("%s\n",b);

I expect the output to be

String ABC
String ABC

but the output I get is

tring ABC
String ABC
roschach
  • 8,390
  • 14
  • 74
  • 124
FlarrowVerse
  • 197
  • 2
  • 13

5 Answers5

3

C performs no bounds checking and will let you overrun the bounds of a buffer. The actual behaviour is undefined, but in your case it is likely that the memory arrangement is thus:

 b a
|-|S|t|r|i|n|g|A|B|C|\0|

After the strcpy()

 b a
|S|t|r|i|n|g|A|B|C|\0|\0|

So b contains 'S' and no nul terminator (because there is no room), so when you print it, it runs into a which has "tringABC".

Other results are possible depending on how the compiler orders and aligns adjacent variables, and how the implementation works with overlapping strcpy() source and destination which is also undefined.

Clifford
  • 88,407
  • 13
  • 85
  • 165
3

The problem is that you are copying to 1 byte string a longer string resulting in undefined behaviour.

If you run this program:

#include<stdio.h>
#include<string.h>

int main(int argc, char *argv[])
{
    char a[] = "String ABC";
    char b[1];
    printf("%p\n", &a);
    printf("%p\n", &b);

    strcpy(b, a);
    int i;
    printf("%c\n", *(&(a[0])-1));
    printf("%c\n", a[0]);
    printf("%s\n",a);
    printf("%s\n",b);
    printf("%p\n", &a);
    printf("%p\n", &b);
}

you see b and a have contiguous addresses and b is stored in a memory address before a. Most likely strcpy copies the string to b but since b is not allocated to store such a long string, it overwrites the next contiguous memory cell which seems to be a.

Let me indicate with || a memory cell storing a char. Suppose -b- is the cell storing one char long string. Before copy you have

|-b-|---a memory allocation--|
|-b-|S|t|r|i|n|g| |A|B|C|D|\n|

Now a is copied into b: the second cell is the one of a which now contain t

  |--a memory allocation-|
|S|t|r|i|n|g| |A|B|C|D|\n|

This is what I suppose it is happening. But remember that copying a longer string into a shorter one will result in undefined behaviour.

roschach
  • 8,390
  • 14
  • 74
  • 124
1

You cannot copy a into b, because there is not enough space in b. The strcpy function will simply write past the end of the array, which is undefined behavior. This means the program can behave in any unpredictable way (which sometimes, if you are unlucky, means it works as you expected).

In other words: when you use strcpy, you must ensure the destination buffer is big enough, including the null terminator. In this particular example, it means that b has to be, at least, 11 elements long (10 for the string, 1 for the null terminator).

Acorn
  • 24,970
  • 5
  • 40
  • 69
  • But it works fine if I set the buffer to size 2. In case of 'char b[2];', my source remains same i.e., a[0]='S', and not a[0]='t', why? – FlarrowVerse May 05 '19 at 17:06
  • 1
    It SEEMS like it "works fine" when the size of b is 2, but it does not work fine because you're overwriting memory that could have important things in it. You're just getting lucky. When using strcpy, it is your responsibility to make absolutely sure that the destination is AT LEAST the length of the string you are copying, +1 for the null terminator. If it isn't, you are going to have big big problems.. eventually. – little_birdie May 05 '19 at 17:10
  • @TheViper if it works for you- code this way :). It is a free country. What problem do you have? – 0___________ May 05 '19 at 17:18
0

As @Acorn mentioned in his answer, the behavior you are seeing is undefined behavior, which means that the compiler is free to generate arbitrary code.

However, if you want to investigate what's happening here (purely for curiosity), it can help to print out the addresses of the arrays.

#include <stdio.h>
#include <string.h>

int main(){
    char a[] = "String ABC";
    char b[1];

    strcpy(b, a);
    int i;
    // printf("%c\n", *(&(a[0])-1));

    printf("%s\n",a);
    printf("%s\n",b);

    printf("%p\n",a);
    printf("%p\n",b);
}

On my machine, the output is the following.

ring ABC
String ABC
0x7ffc36f1b29d
0x7ffc36f1b29c

As you can see, the two array pointers differ by only one. When you copy the source into destination, you've overwritten the first N-1 characters of the source array with the last N-1 characters of the source, where N is the number of characters in the source, including the null terminator.

merlin2011
  • 71,677
  • 44
  • 195
  • 329
  • `which means that the compiler is free to generate arbitrary code` who did tell you that ? Compiler generates code as usually but the result of its execution is undefined/ – 0___________ May 05 '19 at 17:13
  • @P__J__, The wikipedia article on [undefined behavior](https://en.wikipedia.org/wiki/Undefined_behavior) states of the compiler that the `implementation will be considered correct whatever it does in such cases`. I interpret that to mean the compiler is free to generate arbitrary code. – merlin2011 May 05 '19 at 17:33
  • @P__J__: The C standard imposes no requirements on undefined behavior, including code generation. Undefined behavior can affect code generation because many compilers analyze code in sophisticated ways as part of optimization, and undefined behavior can lead to various reductions and transformations of the code during optimization, resulting in different code generation when undefined behavior is present. Yes, if there is undefined behavior on a coce path, the compiler is free, per the C standard, to generate arbitrary code for it. – Eric Postpischil May 05 '19 at 17:37
  • @P__J__: As one example, consider the code fragment `if (some test) { arbitrary code } else { code with undefined behavior }` in some larger context. During optimization, the compiler can reason that `some test` may be assumed to be always true—because either it is true or the code will execute undefined behavior, for which the compiler is allowed to behave in any way, including as if `some test` is true. Therefore, the `else` branch can be removed, and the code can be reduced to `{ arbitrary code }`. Thus, for the undefined behavior, the program effectively executes `arbitrary code`. – Eric Postpischil May 05 '19 at 17:40
  • @EricPostpischil `or which the compiler is allowed to behave in any way` it is not allowed. Actually there are two types of UBs. One is related to the compilator behavior, another to runtime behavior. Example of the first one is a = a++ + ++a;, example of anther is writing outside the array bounds. The latter will not affect the compiler code generation. – 0___________ May 05 '19 at 18:20
  • @P__J__: There are not two kinds of undefined behavior. C 2018 3.4.3 1 defines just one: “**undefined behavior** behavior, upon use of a nonportable or erroneous program construct or of erroneous data, for which this document imposes no requirements.” In both of your examples, the compiler is free to generate any code. You can see at [Godbolt](https://godbolt.org/z/mxLWGZ) that GCC does not just mindlessly generate memory lookups for array accesses. When there is undefined behavior it recognizes at compile time, it generates whatever code its algorithms select. – Eric Postpischil May 05 '19 at 19:01
  • @EricPostpischil formally it one being - but using brain they are distinct in the code generation context. – 0___________ May 05 '19 at 19:10
  • @P__J__: “Using brain”? I have given reasoning (the compiler can recognize some run-time behavior at compile-time, and it affects optimization), a citation from the standard, and an example of it in one compiler (Clang behaves similarly, so two). Where is your reasoning? Why **cannot** a compiler generate whatever code it wants when it recognizes **any** undefined behavior, regardless of the nature of that behavior? Do you claim GCC was wrong to generate the code it did? On what basis—what clause in the C standard does it violate? – Eric Postpischil May 05 '19 at 19:19
  • @EricPostpischil what code did it generate? I do not see any difference between the UB and not UB code in the case we discuss here https://godbolt.org/z/OsxJee . BTW i meant "using logic" :) – 0___________ May 05 '19 at 19:26
  • @P__J__: Your example shows a case where GCC did not generate different code. The issue is not whether the compiler may opt **not** to generate different code, but whether the compiler may opt **to** generate different code. I have given an example of that. GCC does, in some circumstances, generate code for undefined behavior that is not simply a run-time execution of the nominal meaning of the code—it is changed at compile time. Where is your reasoning that this is not permissible by the C standard? Do you deny that it happens? – Eric Postpischil May 05 '19 at 19:30
  • @EricPostpischil so please show me an example of the similar UB when compiler **will** emit different code. As for now you did not (except that pseudocode), bout the real compileable one. – 0___________ May 05 '19 at 19:46
  • @P__J__: Please answer my question: What clause in the C standard does a C implementation violate when it generates arbitrary code for “run-time” undefined behavior? – Eric Postpischil May 05 '19 at 19:49
  • @EricPostpischil it is wrong. When the object is optimized out the not initialized elements are considered zeroed. So the less trivial one https://godbolt.org/z/jzgf7C shows that the code is identical. – 0___________ May 05 '19 at 19:54
  • @P__J__: There is nothing “wrong.” The C code expresses an array look-up, but the generated code does not. And you are ducking the question: What in the C standard say a C implementation may not generate arbitrary code for “run-time” undefined behavior. You claimed this was “logic,” but you have not given **any** logic for that. What is the logic that prevents a compiler from doing this? – Eric Postpischil May 05 '19 at 20:27
  • @P__J__: Arbitrary means the compiler may generate any code. That includes optimized code. What in the C standard say a C implementation may not generate arbitrary code for “run-time” undefined behavior? You claimed this was “logic,” but you have not given any logic for that. What is the logic that prevents a compiler from doing this? – Eric Postpischil May 05 '19 at 20:30
  • In this case it does not have anything in common with the UB – 0___________ May 05 '19 at 20:34
0

Funny, my compiler behaves differently: When compiling it issues a warning:

% gcc strcpy.c -O3
In file included from /usr/include/string.h:494:0,
                 from strcpy.c:1:
In function ‘strcpy’,
    inlined from ‘main’ at strcpy.c:8:5:
/usr/include/x86_64-linux-gnu/bits/string_fortified.h:90:10: warning:
         ‘__builtin___memcpy_chk’ writing 11 bytes into a region of size 1 overflows the
         destination [-Wstringop-overflow=]
   return __builtin___strcpy_chk (__dest, __src, __bos (__dest));
          ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

And when I run the program, it aborts:

% ./a.out                       
*** buffer overflow detected ***: ./a.out terminated