2

Consider the following code:

const char foo[] = "lorem ipsum"; // foo is an array of 12 characters
const auto length = strlen(foo); // length is 11
string bar(length, '\0'); // bar was constructed with string(11, '\0')

strncpy(data(bar), foo, length);
cout << data(bar) << endl;

My understanding is that strings are always allocated with a hidden null element. If this is the case then bar really allocates 12 characters, with the 12th being a hidden '\0' and this is perfectly safe... If I'm wrong on that then the cout will result in undefined behavior because there isn't a null terminator.

Can someone confirm for me? Is this legal?


There have been a lot of questions about why to use strncpy instead of just using the string(const char*, const size_t) constructor. My intent has been to make my toy code close to my actual code which contains a vsnprintf. Unfortunately even after getting excellent answers here I've found that vsnprintf doesn't behave the same as strncpy, and I've asked a follow up question here: Why is vsnprintf Not Writing the Same Number of Characters as strncpy Would?

Jonathan Mee
  • 37,899
  • 23
  • 129
  • 288
  • How could it not be? Assuming of course, that you don't copy more bytes than there is available buffer space. – enhzflep Jan 10 '19 at 16:09
  • @TrebuchetMS Yup thanks, I've fixed that comment. – Jonathan Mee Jan 10 '19 at 16:10
  • 2
    Do you have an actual use case for this? If you give a `std::string` a c-string it will do the same thng without any head scratching. – NathanOliver Jan 10 '19 at 16:10
  • @NathanOliver Yeah, I'm using `vsnprintf` to populate a `string`. Just seemed to add more complexity to the question to ask with that, and didn't force the question. – Jonathan Mee Jan 10 '19 at 16:13
  • what is `string` and what is `data()` ? – Slava Jan 10 '19 at 16:18
  • @Slava I have an evil `using namespace std` before this toy example. So all this stuff is pulled from the standard namespace. – Jonathan Mee Jan 10 '19 at 16:20
  • 2
    Then why not `std::string bar( foo, length );` instead of pesky `strncpy()`? – Slava Jan 10 '19 at 16:21
  • @JonathanMee Any reason you don't use a `stringstream` so you can avoid this entirely? – NathanOliver Jan 10 '19 at 16:29
  • @Slava Actually cause that's not how I'm doing it in my real code. But that would have worked. In my real code I'm calling `vsnprintf` twice, once to get the size, which I use to allocate a `string`, then I call `vsnprintf` again populating the `string`. – Jonathan Mee Jan 10 '19 at 16:34
  • Why not use `char buffer[arbitrarySize]` then `vsnprintf` with it and then just create `std::string` using that buffer and size? You really think that calling `vsnprintf()` twice is more efficient than create char array on the stack? – Slava Jan 10 '19 at 16:38
  • @NathanOliver The `va_list` is being used as a way to allow a C interface to take logging information. So it's not guaranteed that the other side of the interface will even have a `stringstream`. – Jonathan Mee Jan 10 '19 at 16:39
  • @Slava `arbitrarySize` might be too small right? – Jonathan Mee Jan 10 '19 at 16:43
  • Did you mean `cout << bar << endl`? Otherwise I'm not sure of the point of that part of the question – Lightness Races in Orbit Jan 10 '19 at 16:49
  • Probably. Anyway I would put back `std` into your example and change it to do `snprintf( bar.data(), bar.size(), "format", data )` instead of `strncpy()` to avoid confusion. – Slava Jan 10 '19 at 16:53
  • @Slava The double-call is quite a common approach and avoids having to randomly guess at the size you need. Though tradeoffs vary. – Lightness Races in Orbit Jan 10 '19 at 16:54
  • @LightnessRacesinOrbit This is really a question about the underlying `string` I want to know if the 12th "hidden null character" will exist even if I construct it like `string(11, '\0')` – Jonathan Mee Jan 10 '19 at 16:59
  • @LightnessRacesinOrbit sure then question should reflect that instead of ugly example of using `strncpy()` which leads to question, why not to create `std::string` directly from it? Using `snprintf()` in this example would not complicate it at all. – Slava Jan 10 '19 at 17:02
  • @Slava I agree that the use of `strncpy` here is not necessary and should be discouraged (and I did upvote your comment to that effect some time ago). Though that doesn't invalidate the question! – Lightness Races in Orbit Jan 10 '19 at 17:33
  • @LightnessRacesinOrbit I did not say it does (and I would downvote the question if I think so), I just suggested improvements. Using `snprintf` or alike is understandable, using `strncpy` in this case stimulates unrelated discussion. – Slava Jan 10 '19 at 18:40

4 Answers4

6

This is safe, as long as you copy [0, size()) characters into the string . Per [basic.string]/3

In all cases, [data(), data() + size()] is a valid range, data() + size() points at an object with value charT() (a “null terminator”), and size() <= capacity() is true.

So string bar(length, '\0') gives you a string with a size() of 11, with an immutable null terminator at the end (for a total of 12 characters in actual size). As long as you do not overwrite that null terminator, or try to write past it, you're okay.

NathanOliver
  • 171,901
  • 28
  • 288
  • 402
  • Not sure if he will be okay - `std::string::length()` will give "wrong" information – Slava Jan 10 '19 at 16:22
  • 1
    @Slava How would it be wrong? The string starts with a size of `11` which can't change unless they use a string function. If they only copy 5 character they still have a string of size 11, it just has 6 extra nulls terminating it. – NathanOliver Jan 10 '19 at 16:24
  • I put "wrong" in double quotes. Yes it will show right size of the buffer, but using that as a string may lead to ugly issues that very difficult to catch and fix later when this string is used. – Slava Jan 10 '19 at 16:26
  • @Slava a `std::string` is allowed to contain null characters, unlike a C-string. – Mark Ransom Jan 10 '19 at 16:26
  • `strlen` only counts C-style strings. Indeed, the size given by the string is the proper one. – Matthieu Brucher Jan 10 '19 at 16:28
  • @MarkRansom I understand that, so I do not claim that it will be broken for sure but there pretty high chance there would be problems downstream - as most developers expect `length()` to give you length of string, not the size of the buffer. So I would avoid such code if I do not want problems and use `std::vector` if I need a buffer. – Slava Jan 10 '19 at 16:28
  • @Slava I can definitely see the potential for problems, but I think you're overblowing it. `length()` really is giving you the true size of the string, it's `strlen()` that is confused. You'll only run into a problem when you mix code that operates on `string` and C-strings, and then only if you have an embedded null character, which isn't the case in the example given. – Mark Ransom Jan 10 '19 at 16:32
  • @MarkRansom "You'll only run into a problem when you mix code that operates on string and C-strings" right so saying you probably be OK to a person who asks is it fine to use `strncpy()` as it is unlikely you will deal with C-strings. Yea I am overblowing it. – Slava Jan 10 '19 at 16:33
  • @Slava I hear the sarcasm in your response. I would hope that the reason for copying to `string` in the first place is to work on it in that form from that point forward, at which point an embedded null is mostly harmless. Again as demonstrated by the question. – Mark Ransom Jan 10 '19 at 16:40
  • @MarkRansom I am trying to say if somebody trying to `strncpy()` into `std::string` then there is pretty high chance that this string would be converted/treated as C-string later. So I would avoid using `std::string` as a buffer if I do not want to spend long time debugging difficult to catch problems. But that is just my humble opinion so I put "wrong" in double quotes. – Slava Jan 10 '19 at 16:45
  • 1
    @Slava _"as most developers expect length() to give you length of string, not the size of the buffer."_ But that's exactly what it does. A string is a sequence of bytes. This one has some null bytes towards the end of it. You should get out of the habit of thinking of "string" as being equivalent to "c-string". (Meanwhile, the buffer could be somewhat larger thanks to `.reserve()` and so forth) – Lightness Races in Orbit Jan 10 '19 at 16:50
  • @Slava "then there is pretty high chance that this string would be converted/treated as C-string later" *treated* may be a problem, but *converted* (using strlen and strcpy of the data()) should be safe. – Bob__ Jan 10 '19 at 16:57
  • @LightnessRacesinOrbit "You should get out of the habit of thinking of "string" as being equivalent to "c-string"." The problem is not me getting out of this habit but forcing all developers around be of doing that. I do not have that power so it is safer to use `std::vector` when I need buffer and `std::string` when I need string which can be possible treated as C-string. – Slava Jan 10 '19 at 17:00
  • @Slava If the developers around you are treating `std::string` like a C-string then they are not C++ developers.... (although I will agree that a `vector` -- or `vector`! -- is often superior, for a number of reasons) – Lightness Races in Orbit Jan 10 '19 at 17:14
3

There are two different things here.

First, does strncpy add an additional \0 in this instance (11 non-\0 elements to be copied in a string of size 11). The answer is no:

Copies at most count characters of the byte string pointed to by src (including the terminating null character) to character array pointed to by dest.

If count is reached before the entire string src was copied, the resulting character array is not null-terminated.

So the call is perfectly fine.

Then data() gives you a proper \0-terminated string:

c_str() and data() perform the same function. (since C++11)

So it seems that for C++11, you are safe. Whether the string allocates an additional \0 or not doesn't seems to be indicated in the documentation, but the API is clear that what you are doing is perfectly fine.

Community
  • 1
  • 1
Matthieu Brucher
  • 21,634
  • 7
  • 38
  • 62
  • To extend your statement then, you're saying that `string` *is not* allocated with a hidden null terminator? – Jonathan Mee Jan 10 '19 at 16:18
  • The allocation must work that way in order for `c_str()`/`data()` to be `O(1)`, given that indexing `str[str.size()]` gives you a `'\0'` (since C++11). In practice they always worked this way. – Lightness Races in Orbit Jan 10 '19 at 16:52
2

You have allocated an 11-character std::string. You are not trying to read nor write anything past that, so that part will be safe.

So the real question is whether you have messed up the internals of the string. Since you haven't done anything that isn't allowed, how would that be possible? If it's required for the string to internally keep a 12-byte buffer with a null padding at the end in order to fulfill its contract, that will be the case no matter what operations you performed.

Mark Ransom
  • 299,747
  • 42
  • 398
  • 622
1

Yes it's safe according to the char * strncpy(char* destination, const char* source, size_t num):

Copy characters from string

Copies the first num characters of source to destination. If the end of the source C string (which is signaled by a null-character) is found before num characters have been copied, destination is padded with zeros until a total of num characters have been written to it.

Community
  • 1
  • 1
chronoxor
  • 3,159
  • 3
  • 20
  • 31