7

Given:

char test[] = "bla-bla-bla";

Which of the two is more correct?

char *test1 = malloc(strlen(test));
strcpy(test1, test);

or

char *test1 = malloc(sizeof(test));
strcpy(test1, test);
LogicStuff
  • 19,397
  • 6
  • 54
  • 74
LLL
  • 1,273
  • 3
  • 10
  • 8
  • 13
    Attention: `strlen(test) != sizeof test`. The difference is 1. The terminating `'\0'` is not counted with `strlen()` but is accounted for with `sizeof`. – pmg Sep 12 '10 at 15:15
  • 2
    Though sizeof() works in this context I would discourage its use for this type of operation as arrays very easily decay into pointers and unless you are careful you may get caught out by this. – Martin York Sep 12 '10 at 19:13
  • 1
    prefer `strncpy()` over `strcpy()` for memory bounds safety. – Chimera Dec 22 '15 at 19:52

8 Answers8

28

This will work on all null-terminated strings, including pointers to char arrays:

char test[] = "bla-bla-bla";
char *test1 = malloc(strlen(test) + 1);
strcpy(test1, test);

You won't get the correct size of the array pointed to by char* or const char* with sizeof. This solution is therefore more versatile.

LogicStuff
  • 19,397
  • 6
  • 54
  • 74
Tomasz Kowalczyk
  • 10,472
  • 6
  • 52
  • 68
  • 7
    strlen(test) + 1. strlen doesn't count the necessary NULL. – Aram Hăvărneanu Sep 12 '10 at 14:59
  • 1
    Ok, as you wish. I always use sizeof(type) to have more variable-OS-friendly code. – Tomasz Kowalczyk Sep 12 '10 at 15:09
  • 4
    @Tomasz: Except `sizeof(char)` is defined as equal to `1` byte. As for the number of *bits* in it, however... – Jon Purdy Sep 12 '10 at 15:11
  • `sizeof(char)` is defined by the C standard to always be 1. – Tyler McHenry Sep 12 '10 at 15:11
  • 9
    So what is wrong with putting sizeof(char) into the code? It makes it more explicit in what you are trying to do. Thus if the code is ever changed to use wchar_t for example then it would be easier to spot where the code needs to be changed. So: Though not required here I see it as a benefit to include it in the code as it makes the meaning of the code more precise. – Martin York Sep 12 '10 at 19:11
  • 3
    I've strongly argued against writing `sizeof(char)` for years. It adds visual clutter, and the semantics of object size in C have been units of `sizeof(char)` since the beginning. That makes an expression clear and simple in a common case, and calls attention to the dependence on a typed allocation in the other cases. – RBerteig Sep 12 '10 at 22:19
  • 1
    @Martin York: Since you are using the return value of `strlen()`, `char` is implied - if you changed to `wchar_t` you would need to change to `wcslen()` anyway. If you do want to use `sizeof`, I suggest `sizeof test1[0]` (since that will automatically change if you change the type of `test1`). – caf Sep 13 '10 at 00:42
  • Why its working to me without the +1 in the end can some one help me understand? – BananaBuisness Jun 29 '16 at 00:36
3

Neither:

#include <string.h>
char *mine = strdup(test);
Matt Joiner
  • 112,946
  • 110
  • 377
  • 526
  • 6
    This is essentially equivalent to the version using `strlen` -- *on the implementations that provide it.* The problem, of course, is that it's an extension, so you can't depend on its presence. – Jerry Coffin Sep 12 '10 at 15:09
  • Of course you can easily write your own `strdup` if it's missing. – R.. GitHub STOP HELPING ICE Sep 12 '10 at 15:11
  • `strdup` is not defined by the Standard. After using it you need to call `free()`, so you must also `#include `. – pmg Sep 12 '10 at 15:11
  • @pmg: You'd need to #include `` to use `malloc` so that's kind of a given already. – Billy ONeal Sep 12 '10 at 15:33
  • 1
    @R: Although it's easy, if you write your own `strdup`, you get undefined behavior; names starting with `str` are reserved. Fortunately, it's just as easy to write your one `dupe_string` (or whatever other name you prefer). – Jerry Coffin Sep 12 '10 at 15:38
  • On most good systems, strdup is defined as a weak reference. This means you can override it for your own code. Of course this doesn't make it a good idea, but a quick fix for crappy systems that don't provide it. – Matt Joiner Sep 12 '10 at 17:14
3

You should use strlen, because sizeof will fail silently if you change test to be a run-time defined string. This means that strlen is a far safer idea than sizeof as it will keep working.

LogicStuff
  • 19,397
  • 6
  • 54
  • 74
Puppy
  • 144,682
  • 38
  • 256
  • 465
  • at run-time the variable also can be uninitialized, strlen/strcpy will crash. – user411313 Sep 12 '10 at 21:42
  • At least in that case you'd get a crash instead of silently the wrong value. In addition, it's much easier for the compiler to warn you about potential uninitialized variable use. Finally, if you pass an uninitialized variable, it doesn't matter how you determined it's size. – Puppy Sep 12 '10 at 21:53
2
char test[]="bla-bla-bla";
char *test1 = malloc(strlen(test) + 1); // +1 for the extra NULL character
strcpy(test1, test);
mmonem
  • 2,811
  • 2
  • 28
  • 38
2

I think sizeof is the correct one. Reason behind that is strlen(str) will give you length of the string( excluding the terminating null). And if you are using strcpy, it actually copy the whole string including the terminating null, so you will allocate one byte less if you use strlen in malloc. But sizeof gives the size of the string pointed by test, including the terminating null, so you will get correct size malloc chunk to copy the string including the terminating null.

LogicStuff
  • 19,397
  • 6
  • 54
  • 74
Anil Vishnoi
  • 1,352
  • 3
  • 18
  • 25
  • sizeof() will return a surprisingly 'wrong' result if you pass a (char *) object to it. It will return "the number of bytes necessary to store a pointer to 'char'", which is not necessarily equal to the size of the string pointed at by the pointer. – Giorgos Keramidas Sep 12 '10 at 15:36
  • +1: In the code shown, the `sizeof()` solution is correct and the `strlen()` one is incorrect. The `sizeof()` solution might over-allocate memory if the string has been shortened since it was created. However, the definition of the array must be in scope for it to work - so the more generally safe solution uses `strlen(str)+1`, or assumes a POSIX environment and uses `strdup()`. – Jonathan Leffler Sep 12 '10 at 15:38
1

1) definitely causes UB

2) may cause UB (if malloc fails)

I'd go with 2) as there is a better chance of the construct working as intended; or even better I'd write a version that works as intended (without UB) in all situations.


Edit

  • Undefined Behaviour in 1)

    test1 will have space for the characters in test, but not for the terminating '\0'. The call to strcpy() will try to write a '\0' to memory that does not belong to test1, hence UB.

  • Undefined Behaviour in 2)

    If the call to malloc() fails to reserve the requested memory, test1 will be assigned NULL. Passing NULL to strcpy() invokes UB.

The return value of calls to malloc() (and calloc() and friends) should always be tested to ensure the operation worked as expected.

pmg
  • 106,608
  • 13
  • 126
  • 198
  • You know, it would probably be helpful to the OP if you would explain why UB is getting involved (the fact that he's not handling malloc failures). – Billy ONeal Sep 12 '10 at 15:32
  • Actually, both cases have the same UB due to failure to validate the return of `malloc()`. The first case has the bonus UB from not allocating room for the NUL. – RBerteig Sep 12 '10 at 22:24
0

(1) with strlen but not adding 1 is definitely incorrect. If you add 1, it would have the added benefit that it also works for pointers, not just arrays.

On the other hand, (2) is preferred as long as your string is actually an array, as it results in a compile-time constant, rather than a call to strlen (and thus faster and smaller code). Actually a modern compiler like gcc can probably optimize the strlen out if it knows the string is constant, but it may be hard for the compiler to determine this, so I'd always use sizeof when possible.

R.. GitHub STOP HELPING ICE
  • 208,859
  • 35
  • 376
  • 711
  • In all situations where you can correctly use `sizeof`, the compiler **will** figure out that `strlen` can be optimized. So there’s simply no reason to use `sizeof` *at all*. – Konrad Rudolph Sep 12 '10 at 15:40
  • 1
    @Konrad: `char test[] = "bla bla \0 bla";`; `strlen(test) == 8` – pmg Sep 12 '10 at 15:59
  • @Konrad: not true. It's possible the contents of the string are variable, but always fit in a small fixed-size array. In this case, `strlen` **cannot** be optimized out, but allocating a fixed-size object the same size as the array is always suitable. – R.. GitHub STOP HELPING ICE Sep 12 '10 at 16:33
  • 1
    @pmg: that’s a completely different use-case. We were (implicitly) talking about zero-delimited strings here, otherwise `strlen` would make no sense at all. When dealing with raw data that may contain null bytes, treating that data as a (zero-delimited) string is obviously a bug. – Konrad Rudolph Sep 12 '10 at 16:33
-1

If it is a critical path, sizeof has advantage over strlen as it has an O(1) complexity which can save CPU cycles.

LogicStuff
  • 19,397
  • 6
  • 54
  • 74
  • 1
    sizeof is computed at compile time and at every place sizeof is replaced by an integer as computed by compiler for that. There is no more run time computation for sizeof however for strlen the complete string is parsed once. – user3141529 Feb 02 '14 at 17:39