How string accepting interface should look like?

Question

This is a follow up of this question. Suppose I write a C++ interface that accepts or returns a const string. I can use a const char* zero-terminated string:

void f(const char* str); // (1)

The other way would be to use an std::string:

void f(const string& str); // (2)

It's also possible to write an overload and accept both:

void f(const char* str); // (3)
void f(const string& str);

Or even a template in conjunction with boost string algorithms:

template<class Range> void f(const Range& str); // (4)

My thoughts are:

(1) is not C++ish and may be less efficient when subsequent operations may need to know the string length.
(2) is bad because now f("long very long C string"); invokes a construction of std::string which involves a heap allocation. If f uses that string just to pass it to some low-level interface that expects a C-string (like fopen) then it is just a waste of resources.
(3) causes code duplication. Although one f can call the other depending on what is the most efficient implementation. However we can't overload based on return type, like in case of std::exception::what() that returns a const char*.
(4) doesn't work with separate compilation and may cause even larger code bloat.
Choosing between (1) and (2) based on what's needed by the implementation is, well, leaking an implementation detail to the interface.

The question is: what is the preffered way? Is there any single guideline I can follow? What's your experience?

Edit: There is also a fifth option:

void f(boost::iterator_range<const char*> str); // (5)

which has the pros of (1) (doesn't need to construct a string object) and (2) (the size of the string is explicitly passed to the function).

in case (2) there will be no heap allocation. string will be constructed on stack — Industrial-antidepressant, Jan 09 '11 at 17:58
@nice: Right, std::string itself is allocated on the stack. But if your string is long enough or your implementation doesn't use a short-string optimization then std::string will allocate its storage on the heap. — Yakov Galka, Jan 09 '11 at 18:02
I think heap allocation will occur only when std::string copy contructor is called — Industrial-antidepressant, Jan 09 '11 at 18:07
@nice: then you're wrong. You're welcome to overload new and verify it yourself (don't forget to use a "looooong string"). — Yakov Galka, Jan 09 '11 at 18:11
You are right, I didn't know that. It will cause a heap allocation. — Industrial-antidepressant, Jan 09 '11 at 18:25
@Kos: it wont't solve any problem here. Even an immutable string must copy the data according to its semantics. — Yakov Galka, Jan 09 '11 at 18:37
"accepts" and "returns" should probably be considered separately (separate options, I mean, not necessarily separate question), since if you're *returning* a string then you have a question of memory management to deal with. Personally I don't care if a function accepts `const char*` and I have a `string` to pass it, since it's a trivial difference in the calling code. The other way round is trivial code, if not necessarily trivial performance. If it returns a `const char*`, though, then I have to worry about who frees it, whereas if it returns a `string` (object, not reference) I don't. — Steve Jessop, Jan 09 '11 at 18:38
@ybungalobill - I meant that if the committee would agree that std::string as immutable, then the consequences would allow to create it from a `const char*` in a light way - so (possibly) without any runtime overhead, depending on the actual implementation and some design decisions. Sad it isn't the case. — Kos, Jan 10 '11 at 17:25
@Kos: I perfectly understood what you meant. But you cannot: `immutable_string f() { char buf[128]; ... return buf; }` immutable_string must do a copy when initialized with a non-immutable_string. — Yakov Galka, Jan 10 '11 at 17:39

score 6 · Answer 1 · answered Jan 09 '11 at 17:45

6

If you are dealing with a pure C++ code base, then I would go with #2, and not worry about callers of the function that don't use it with a std::string until a problem arises. As always, don't worry too much about optimization unless there is a problem. Make your code clean, easy to read, and easy to extend.

answered Jan 09 '11 at 17:45

Mark Loeser

17,657
2
26
34

But, why do you prefer 2 to, e.g. 1? It doesn't do it cleaner, easier to read or easier to extend! – Yakov Galka Jan 09 '11 at 17:48
1

@ybungalobill: Because if I'm writing C++, I'd rather deal with C++ constructs unless I run into problems with performance that I need to start addressing. – Mark Loeser Jan 09 '11 at 17:49
4

@ybungalobill `const char* str` is a *pointer to a char* by definition. It's a *string* only by convention. That's why in C++, 2 is cleaner. – Oswald Jan 09 '11 at 17:54
1

@Mark, one of the great things about C++ is that you don't have to restrict yourself to one strict approach - using `const char*` doesn't mean it's bad C++, it's just a different style, there's nothing inherently wrong with it as long as it makes sense! – Nim Jan 09 '11 at 17:56
@Nim: This is an opinion question, so we'll all have different opinions :) I prefer to not deal with pointers in C++ unless I have to. – Mark Loeser Jan 09 '11 at 18:01
@ybungalobill: (1) makes it difficult to pass in arbitrary data sometimes -- for example, a `basic_string` can contain `null` characters, but a null terminated string cannot. – Billy ONeal Jan 09 '11 at 19:22

Oswald · Answer 2 · 2011-01-09T18:16:08.040

4

There is a single guideline you can follow: use (2) unless you have very good reasons not to.

A const char* str as parameter does not make it explicit, what operations are allowed to be performed on str. How often can it be incremented before it segfaults? Is it a pointer to a char, an array of chars or a C string (i.e. a zero-terminated array of char)?

edited Jan 09 '11 at 18:16

answered Jan 09 '11 at 17:50

Oswald

31,254
3
43
68

1

You could also follow the guidline: use (1) unless you have very good reasons not to. Can you justify you guideline over the alternative? – CB Bailey Jan 09 '11 at 18:03
1

All valid concerns; why not put them in your answer? – CB Bailey Jan 09 '11 at 18:13
@Charles Bailey I have not thought of that. I now moved the comment into the answer. – Oswald Jan 09 '11 at 18:17

score 3 · Answer 3 · answered Jan 09 '11 at 21:08

I don't really have a single hard preference. Depending on circumstances, I alternate between most of your examples.

Another option I sometimes use is similar to your Range example, but using plain old iterator ranges:

template <typename Iter>
void f(Iter first, Iter last);

which has the nice property that it works easily with both C-style strings (and allows the callee to determine the length of the string in constant time) as well as std::string.

If templates are problematic (perhaps because I don't want the function to be defined in a header), I sometimes do the same, but using char* as iterators:

void f(const char* first, const char* last);

Again, it can be trivially used with both C-strings and C++ std::string (as I recall, C++03 doesn't explicitly require strings to be contiguous, but every implementation I know of uses contiguously allocated strings, and I believe C++0x will explicitly require it).

So these versions both allow me to convey more information than the plain C-style const char* parameter (which loses information about the string length, and doesn't handle embedded nulls), in addition to supporting both of the major string types (and probably any other string class you can think of) in an idiomatic way.

The downside is of course that you end up with an additional parameter.

Unfortunately, string handling isn't really C++'s strongest side, so I don't think there is a single "best" approach. But the iterator pair is one of several approaches I tend to use.

+1. the advantage of the single parameter range is that it allows automatic conversion from both, std::string and C-strings, so user's code remains as simple as before: f("hello"). That's impossible with two parameters. I wonder why C++0x standard doesn't do anything in this direction for fstream::open... — Yakov Galka, Jan 09 '11 at 21:28
The problem with the single parameter range is that you get a dependency on Boost.Range if you want it to *Just Work* with C-strings. But you're right, the syntax for that is certainly more convenient. — jalf, Jan 09 '11 at 22:42
it's not necessarily must be a boost range. Boost range is just an example, it also won't provide automatic conversion from std::string so some derived type will be defined anyway. However, on second thought, there is another problem of this approach: unlike (1), (5) doesn't have zero-terminated semantics. That means that it doesn't fully solve the problem when you use it with a low-level function expecting a zero-terminating string. In such case you need to create a zero terminated copy anyway. any ideas? — Yakov Galka, Jan 10 '11 at 17:46
Yeah, I was hesitant to mention the dependency on Boost, because it's not necessarily on boost. But you do need a dependency on *some* range implementation. And if you do need to work with a null-terminated string, I'd prefer to handle that at the call site. Call `strlen` once to find the null, and then generate a range (or iterator pair) based on that, which I can pass to the function. That way, the contract becomes clearer, and there's no ambiguity about what the function parameter means. — jalf, Jan 10 '11 at 23:50

CB Bailey · Accepted Answer · 2011-01-09T18:11:58.880

For taking a parameter I would go with whatever is simplest and often that is const char*. This works with string literals with zero cost and retrieving a const char* from something stored in a std:string is typically very low cost as well.

Personally, I wouldn't bother with the overload. In all but the simplest cases you will want to merge to two code paths and have one call the other at some point or both call a common function. It could be argued that having the overload hides whether one is converted to the other or not and which path has a higher cost.

Only if I actually wanted to use const features of the std::string interface inside the function would I have const std::string& in the interface itself and I'm not sure that just using size() would be enough of a justification.

In many projects, for better or worse, alternative string classes are often used. Many of these, like std::string give cheap access to a zero-terminated const char*; converting to a std::string requires a copy. Requiring a const std::string& in the interface is dictating a storage strategy even when the internals of the function don't need to specify this. I consider it this to be undesirable, much like taking a const shared_ptr<X>& dictates a storage strategy whereas taking X&, if possible, allows the caller to use any storage strategy for a passed object.

The disadvantages of a const char* are that, purely from an interface standpoint, it doesn't enforce non-nullness (although very occasionally the difference betweem a null parameter and an empty string is used in some interfaces - this can't be done with std::string), and a const char* might be the address of just a single character. In practice, though, the use of a const char* to pass a string is so prevalent that I would consider citing this as a negative to be a fairly trivial concern. Other concerns, such as whether the encoding of the characters specified in the interface documentation (applies to both std::string and const char*) are much more important and likely to cause more work.

I've always preferred plain const char*, for the same reasons as you. Btw, there is a fifth option: void f(boost::iterator_range str) that doesn't dictate a storage strategy and yet as efficient as std::string. I just haven't checked how clean the code becomes. — Yakov Galka, Jan 09 '11 at 18:25
@ybungalobill: You have added a dependency on boost and for many people that is a non-trivial concern. — CB Bailey, Jan 09 '11 at 18:27
it's just an idea. It may be anything else, like std::pair or perhaps implement your own lightweight wrapper. — Yakov Galka, Jan 09 '11 at 18:32
Personally I prefer `const string&` for the same reason why you often prefer `const char*` -- because it's simplest. Simplest, in practice, because then you can call the function with either a `string` object or a literal string, or a `const char*`. It will create a temporary in the first two cases, but I let the profiler tell me what I need to worry about, performance wise. It's usually not this in my case. — John Dibling, Jan 09 '11 at 18:41
Accepting mostly because you pinned down the problem of std::string "dictating a storage strategy even when the internals of the function don't need to specify this...". This made me thinking seriously of using ranges. — Yakov Galka, Jan 09 '11 at 19:32

score 1 · Answer 5 · answered Jan 09 '11 at 17:51

1

The answer should depend heavily on what you are intending to do in f. If you need to do some complex processing with the string, the approach 2 makes sense, if you simply need to pass to some other functions, then select based on those other functions (let's say for arguments sake you are opening a file - what would make most sense? ;) )

answered Jan 09 '11 at 17:51

Nim

33,299
2
62
101

That's leaking an implementation detail. That's exactly what I want to avoid. – Yakov Galka Jan 09 '11 at 17:57
...really? how? what does taking a `const char*` tell you above a `const std::string&` - or what does the other hide? – Nim Jan 09 '11 at 18:04
@Nim, suppose that tomorrow I change my implementation so the other way is preferred. According to you I need to change my interface. That leaks an implementation detail: "this function uses XXX internally, so every time it'll change from XXX to YYY the interface will change accordingly". – Yakov Galka Jan 09 '11 at 18:09
@ybungalobill, erm why would you do that? There is nothing mentioned about subsequent alterations - anyways seems rather pointless to propagate such a change to the interface! All I was pointing out is that you should make the decision based on what you are doing in `f`, but once you've defined an interface - changing it for something like this is meaningless. NOTE: that aside, changing from a function that accepts a `const char*` to a `const std::string&` is possible without breaking existing code, but not the other way around... – Nim Jan 09 '11 at 18:22
@ybungalobill, of course, which is why I said "code" rather than builds or binaries or libraries etc! ;) – Nim Jan 09 '11 at 18:33
@Nim: that's why I mentioned the separate compilation issue and ruled out templated version! ;) – Yakov Galka Jan 09 '11 at 18:39
1

@Nim: "without breaking existing code" - not quite. If the argument expression I provide to the function is something with a user-defined conversion to `const char*`, then changing the function to `const std::string&` will break my code, because now there are too many user-defined conversions required to coerce it. Unusual situation, but if you're going to claim interface compatibility for your components then it matters. – Steve Jessop Jan 09 '11 at 18:52
@Steve Jessop, true - did not think of that... I guess my point was changing the interface because of an implementation detail seems pointless.. – Nim Jan 09 '11 at 18:58

score 0 · Answer 6 · answered Jan 09 '11 at 17:51

0

It's also possible to write an overload and accept both:

void f(const string& str) already accepts both because of the implicit conversion from const char* to std::string. So #3 has little advantage over #2.

answered Jan 09 '11 at 17:51

dan04

87,747
23
163
198

1

It avoids the conversion. It allows the implementation to decide which version is better. – Yakov Galka Jan 09 '11 at 17:53

score 0 · Answer 7 · answered Jan 09 '11 at 17:51

0

I would choose void f(const string& str) if the function body does not do char-analysis; means it's not referring to char* of str.

answered Jan 09 '11 at 17:51

Nawaz

353,942
115
666
851

1

What exactly do you mean by `char`-anaysis? How is a `std::string` any more or less suitable for analysis of its constituent `char` than `const char*` ? – CB Bailey Jan 09 '11 at 18:05
@Charles : if the function body operates on characters of the string (more like parsing), then why pass `string` to begin with? – Nawaz Jan 09 '11 at 18:13
I presume you mean operate in a read-only sense as both alternatives in the question are read-only; isn't it just as easy to read the characters of a `std::string` as a `char` array (passed via `const char*`) - indeed you can use the same `[]` syntax? In fact, why pass the parameter at all if you're not going to look at its contents? – CB Bailey Jan 09 '11 at 18:17
@Charles : `[]` requires function call in case of string. I think that would be a bit slow if the function body's whole business is playing with the chars. – Nawaz Jan 09 '11 at 18:20
2

Have you tested the cost of the function call? I just compiled a function that returned an arbitrary character from a `std::string` passed in by `const` reference and it compiler to two `mov` and one `ret` instructions, no actual `call`. Besides, I though you were backing having `std::string` in the interface and now you have me supporting that? – CB Bailey Jan 09 '11 at 18:25
@Charles : if there is no `call`, then I think, that is compiler optimization. I need to experiment with it – Nawaz Jan 09 '11 at 18:29
@Charles : no; i explicitly said if the function body's whole business is playing with each chars,the raw data, then in that case `char*` is preferred over `string`. `ifstream::open()` doesn't do such thing, it operates on string as a whole! – Nawaz Jan 09 '11 at 18:32
I'm not sure I understand the distinction between reading the `char` s and dealing with the string as a whole. If you read the string you read the `char` s. In any case, the most likely thing to happen to the filename parameter in `ifstream::open` is that it's passed a lower layer operating system function or system call, which in all probability will take a `const char*` anyway. – CB Bailey Jan 09 '11 at 18:54
@Charles : the last statement is convincing to me. :-) – Nawaz Jan 09 '11 at 19:03

score 0 · Answer 8 · answered Jan 09 '11 at 19:10

0

Use (2).

The first stated problem with it is not an issue, because the string has to be created at some point regardless.

Fretting over the second point smells of premature optimization. Unless you have a specific circumstance where the heap allocation is problematic, such as repeated invocations with string literals, and those cannot be changed, then it is better to favor clarity over avoiding this pitfall. Then and only then might you consider option (3).

(2) clearly communicates what the function accepts, and has the right restrictions.

Of course, all 5 are improvements over foo(char*) which I have encountered more than I would care to mention.

answered Jan 09 '11 at 19:10

JohnMcG

8,709
6
42
49

"string has to be created at some point" no it's not, like in f(...) { cout << str; }, no strings created. "(2) clearly communicates what the function accepts" it's not. The function expects a random access sequence of characters, it doesn't necessarily expects an std::string object. I don't see why I need to specify in the interface that this sequence must be owned by std::string. – Yakov Galka Jan 09 '11 at 19:25
Were you looking for an answer or an argument? – JohnMcG Jan 09 '11 at 22:12
My first point is at some point a structure must be created to hold the string in memory. If you don't want that to be an object, then stick to C. – JohnMcG Jan 09 '11 at 22:14
For the second point, I guess I'm relying my subjective experience that I understand what (2) is asking for faster than the template or iterator range signatures. – JohnMcG Jan 09 '11 at 22:15

How string accepting interface should look like?

8 Answers8

Linked