This is more of a policy or a historical question. Why was it decided not to provide a const char * conversion for std::string? Were there a fear someone might do printf("%s", s) and believe it would automatically convert? Are there any open discussions on this issue?
4 Answers
Automatic casts are almost always evil. If there were a cast to const char *
, a std::string
could be also automatically cast to other pointer types and that could lead to hard to find bugs. There is the c_str()
method that returns const char *
so you can still achieve what you need. Also, the typecast is not logically correct - std::string
is not equivalent to const char *
.

- 15,005
- 4
- 44
- 68
-
7+1: Agreed. There are examples of horrific cast overloading, even in the standard library. For example, consider `std::ios::operator void*`. – Oliver Charlesworth Nov 04 '10 at 11:38
-
2+1, Also what would the pointer point to, if the string gets destroyed? – BЈовић Nov 04 '10 at 12:03
-
@VJo: True, but regular pointers (`const char *`) and `.c_str()` suffer from the same problem. – Karel Petranek Nov 04 '10 at 12:07
-
1@dark_charlie - If the cast can cause problems then so can one .c_str() too many, as I would expect them to be equivalent. In any case, you are saying it is the less of two evils... I have for years been using a derived class with auto casting and I have yet to have had any bugs because of it. But perhaps this is because I'm not using generic template meta-programming with agile Boost classes. :-) – Dov Grobgeld Nov 04 '10 at 12:15
-
4@dov: If you know what you are doing and coded the derived string class yourself, then it's probably not a problem. But having this in a standard would mean everyone having this casting functionality - and not everyone is aware of automatic cast pitfalls, not noting the performance problems that could arise as described by @edA-qa mort-ora-y. – Karel Petranek Nov 04 '10 at 12:21
-
@dark_charlie In c++, there are very rare cases that requires raw pointers. In this specific case, use std::string, and forget about const char* – BЈовић Nov 04 '10 at 13:14
-
2@Oli: Though that particular example dates back to before the safe-bool idiom was understood and that type of use is replaced in 0x with an explicit operator bool. – Nov 04 '10 at 14:23
-
@VJo: That's right - I would even consider it a general rule of C++ programming. – Karel Petranek Nov 04 '10 at 16:35
The string class internally need not store the string with a terminating 0. In fact it doesn't even have to store them in contiguous memory if it didn't want to. Therefore an implicit cast doesn't make sense, since it may be a costly operation.
The c_str() function then gives you the c-string. Depending on how the library stores it internally this function may have to create a temporary. This temporary is only valid until you modify the string.
It is unfortunately however since a string could just been specified to be a c-string internally. This wouldn't lead to any loss of functionality and would allow an implicit conversion.
Edit The standard does basically imply the memory is contiguous (if accessed through data() or the [] operator), though it need not be internally, and certainly not null terminated. Likely all implementations store the 0 as well. If this were standardized then the implicit conversion could be safely defined.

- 30,295
- 39
- 137
- 267
-
3Actually, I think the string class, as the vector, have to store it as continues memory. – Viktor Sehr Nov 04 '10 at 12:16
-
1+1. This is better than dark_charlie's answer. The creation of a temporary object, with non-trivial lifetime issues is more important than a somewhat suprising cast. – MSalters Nov 04 '10 at 12:17
-
Why do there need to be any semantic differences between a cast operator and a call to .c_str()? If they are equivalent then there is no difference (except for convenience) to use cast instead of the c_str() call. – Dov Grobgeld Nov 04 '10 at 12:31
-
3@Viktor: No. Unlike `std::vector` (where the requirement was introduced for C++03), `std::string` does not have to use contiguous storage until you call `c_str()`. Then of course the pointer returned from that must point to contiguous storage. I vaguely remember a report (perhaps from Herb Sutter?) that the C++0x WG conducted a straw poll, and nobody present knew of an active implementation which *doesn't* use contiguous storage always. – Steve Jessop Nov 04 '10 at 13:24
-
1The rationale behind allowing a non-contiguous internal storage dates back to when strings used shared COW buffers. This was intended to be efficient in that it reduced the number of memory copies, but thread-contention issues made it often less-efficient in reality as well as harder to implement. With COW buffers, if you add two strings together you can have an "rope" implementation temporarily until someone calls c_str() on it or writes to it. If a user is doing a lot of + operations most will be temporary and they will never perform either of these and there will be 1 ultimate copy. – CashCow Nov 04 '10 at 13:51
-
@Steve: Contiguous storage is [required for string via indexing](http://www.open-std.org/jtc1/sc22/wg21/docs/lwg-defects.html#530) according to Matt Austern (what's missing is an explicit guarantee for iterators). It was always intended that vector store contiguously, rather than being introduced in 03 — the standardese version of a bugfix. – Nov 04 '10 at 14:27
-
1@Roger, reading the standard again I'd agree that it must be contiguous: str[n] is defined as str.data()[n]. Also, a few of the constructors require that data() is contiguous. – edA-qa mort-ora-y Nov 04 '10 at 14:38
-
@Roger: I said the *requirement* was introduced, agreed that the vague intention was there all along ;-). `operator[]` being defined in terms of `data()`, which *can invalidate pointers and iterators*, doesn't actually help the client much in treating the string storage itself as contiguous - you can only actually use it as a contiguous block by calling `data()` or `c_str()` and using the pointer. AFAIK, if you do `assert(&mystr[0] + 1 == &mystr[1])`, then the "second" call to `data()` is permitted to invalidate the first pointer. Undesirable, but not banned, hence a defect. – Steve Jessop Nov 04 '10 at 14:42
-
1Hence, "I don't believe it's possible to write a useful and standard- conforming basic_string that isn't contiguous". It's possible to write a standard-conforming one, but pretty much only out of perversity. – Steve Jessop Nov 04 '10 at 14:45
-
@Steve: I'd say the requirement existed for vector all along (Stroustrup agrees, I can search for the quote if you like); it was an omission/bug to not spell it out explicitly, but anyone treating a vector as if it might be non-contiguous was (if they are following C++98) doing it wrong. Similarly for indexing std::string, and you can trust Austern's and the committee's interpretation (from what I linked above) for that. – Nov 04 '10 at 14:46
-
@Roger: I try not to read between the lines of standard in that way. You're second-guessing implementers, which is fine if they read between the lines in the same way you do, but not fine if you've missed some actual reason why an implementer would do something unexpected. Saying, "I doubt that any vector will be non-contiguous" might be sufficient for most practical purposes, but is a slightly different statement from saying, "I believe that the standard forbids non-contiguous vectors". Can I say that `copy_if` is required by the standard, since it too was left out by accident? ;-) – Steve Jessop Nov 04 '10 at 14:49
-
And as for Stroustrup: I absolutely believe that he *intended* to specify vector contiguous, but the authors of TC1 obviously believed that the C++98 standard failed to actually achieve this. As you say, it's a defect in the standard, and it was corrected, so the problem is solved now. I don't think it's (yet) solved for `string`. Putting it another way, one of the (slight) disadvantages of standardizing a language is that you throw common sense, inventor's intentions, and common practice, out of the window in favour of using a fixed text. – Steve Jessop Nov 04 '10 at 14:50
-
@Steve: You hint at the real problem: standards depend on interpretation and attempting to use the literal text is fraught with peril. Compare to the necessity of a law degree to "really" understand laws, even though they're written in English or another language most people understand perfectly well in other contexts. – Nov 04 '10 at 15:04
-
1@Roger: I think the analogy is fun, but limited. The C++ standard is a model of precision and non-ambiguity compared with much or most legislation, at least here in the UK. For example, it uses the word "reasonable" only three times ;-) Some interpretation of language is needed to understand the C++ spec (as with any document, especially once the post-modernists get involved), but not a vast amount, and a think that creative interpretation isn't justified. We have language-lawyers, but we don't have (or need) language-judges and language-appeals-courts. It really is *much* easier than law. – Steve Jessop Nov 04 '10 at 15:14
-
@Steve: Is it really that much easier? I agree with Austern that string indexing is contiguous, but you don't. – Nov 04 '10 at 15:16
-
1@Roger: Never mind interpreting the standard: I don't even agree with your interpretation of Austern's defect report ;-p. He doesn't say string is required to be contiguous. He says that any non-contiguous string implementation wouldn't be useful, but he carefully doesn't say that it wouldn't conform. We're both agreed that the sensible and good-faith thing for an implementer to do is to make it contiguous, and that all known implementers do this. It's easier than law because we aren't paid to *try* to drive trucks through any defects in the standard... – Steve Jessop Nov 04 '10 at 15:21
-
I would be interested to know what he'd say to my example `assert`, though, that I used to suggest that string indexing isn't merely permitted to be non-contiguous, it's permitted to be a completely inconsistent shifting quicksand of continual reallocation. I might ask a question, so we can bring in more people than just the two of us and edA-qa mort-ora-y, but I'll have to brace myself for a storm of "who cares about the standard, that would be stupid, so it can't happen no matter what the standard says". I've only just recovered from the last one I hit. – Steve Jessop Nov 04 '10 at 15:25
-
@Steve: From DR 530: "We almost require [complete] contiguity already. ... defines operator[] as data()[pos]. What's missing is a similar guarantee if we access the string's elements via the iterator interface." This says he interprets op[] as requiring contiguity, but accessing via iterators doesn't (e.g. &*s.begin()). I've added "complete" to hopefully clarify for you. Apparently the DR was later edited, probably while trying to align clause numbers from 03 to 0x, so the DR now states a different clause than it did originally where I used an ellipsis. – Nov 04 '10 at 15:32
-
Or not, the [first version of 530 I can find](http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2005/n1926.html#530) includes that mistake. – Nov 04 '10 at 15:42
-
@Roger: The defect reports is that if you never call `data()` or `operator[]`, then the string need not be stored contiguously. I'd call this a non-contiguous string implementation, by which I mean one that isn't *always* contiguous. I agree with Austern that in that sense, the standard doesn't require strings to be contiguous. I cautiously disagree with his implicit assumption that if you call `data()` multiple times, the standard requires that you get the same answer every time. But I may be wrong on that, it's the part I'm thinking of asking a question on. – Steve Jessop Nov 04 '10 at 15:42
-
@Steve, I was just about to say that as well. The `data()` and `operator[]` need to present a contiguous view of the string, but nothing says that is how it has to be implemented internally. – edA-qa mort-ora-y Nov 04 '10 at 15:44
-
I suppose basically: is it valid for `string` to create a new contiguous copy of the data every time `data()` is called, and free the old one? Obviously that's irrational, and it doesn't matter very much whether it's legal or not since nobody is planning to do it, I just can't for the moment see what makes it illegal. If I was a lawyer, I'd be advising my clients on their liability when they found a way to make money by doing exactly that ;-) – Steve Jessop Nov 04 '10 at 15:45
-
@Steve: Do you believe `assert(&s[0] == &s[0])` is required to be true? Even though the text uses "data()", string guarantees that after the first indexing operation, those references aren't invalidated by later indexing operations. (They can, of course, be invalidated by intervening operations that aren't indexing, but that's not the case here.) – Nov 04 '10 at 15:45
-
@Steve, the standard clearly states that a call to `data()` may invalidate any iterators, references and pointers to the string elements. 21.3.5. Of course it also says that the non-const operator[] should not invalidate them... so I'm guessing there is an inconsistency in the standard. – edA-qa mort-ora-y Nov 04 '10 at 15:48
-
@Roger: this is the thing. Obviously it *should* require that calling `data()` once "nails it down", I just haven't found anywhere that says that. Usually when the standard says "X returns Y", you can assume that X may *execute* Y. `data()` is permitted to invalidate, if called directly by the program. Normally that would mean X is permitted to invalidate too. But then, it is implied by omission that X can't do it (21.3/5, the last para in the list mentions the *first* call to operator[] but omits others). This isn't the standard's finest hour, IMO. – Steve Jessop Nov 04 '10 at 15:57
-
So, alongside asking about `assert(&s[0] == &s[0])`, we have to ask whether the standard truly requires `assert(s.data() == s.data())`. Any right-thinking person wants it to, of course, but there's no caveat in 21.3/5 about only the *first* call to `data()` being allowed to invalidate pointers. – Steve Jessop Nov 04 '10 at 16:00
-
Even worse, `data()` doesn't return a pointer to the elements of the string, it returns a pointer to an array whose elements are equal to the elements of the string. Nothing seems to talk about the lifetime of the array pointed to by `data()`, and most of the `string` section seems to assume that it's not an "other" array, it's the actual elements. In particular non-const `operator[]` allows you to modify the string, which clearly you couldn't do if that call to `data()` just points to a copy, and which you aren't permitted to do by calling `data()` yourself. – Steve Jessop Nov 04 '10 at 16:08
-
... so I think Austern's case rests on a part of the standard which is a bit inconsistent - we have the behaviour required of `data()`, we have stronger requirements on behaviour of `operator[]`, but then we have the weird inconsistency that `operator[]` is defined to return the same thing as `data()`. Or perhaps not inconsistency, perhaps there's a way to reconcile them without adding new text, by saying that since they're "the same", any restrictions on the implementation of either one of them apply to both. What do you think? – Steve Jessop Nov 04 '10 at 16:13
Regarding your previous comments:
If they are equivalent then there is no difference (except for convenience) to use cast instead of the c_str() call
There is one very important difference: one is implicit whereas the other is explicit.
C++0x introduces the notion of explicit
cast operators, but until then they are implicit, meaning that it is never clear (when looking at the code) whether they will be used or not.
Implicit cast are bad, especially since they can be cascaded, leading to extremely obscure code.
Moreover, as already stated, there is here a problem of correctness. Because the pointer returned by c_str
is only valid as long as the string
object does not change, then you could find yourself with hard to find bugs. Consider:
void function()
{
std::map<int, char const*> map;
map[1] = boost::lexical_cast<std::string>(47);
std::cout << map[1] << std::endl; // CRASH here, if lucky...
}

- 287,565
- 48
- 449
- 722
My feeling is that C++ is a strongly typed language and implicit type conversions break type-safety.
It can often bite you where the conversion happens at a point where you do not expect it and can make your code hard to debug.
Non-explicit constructors can have a similar effect and std::string itself does have an implicit constructor from const char *. In this case it is not necessarily a bad thing although it can lead to inefficient code.

- 30,981
- 5
- 61
- 92