2

Like many people I'm in the habit of writing new string functions as functions of const std::string &. The advantages are efficiency (you can pass existing std::string objects without incurring overhead for copying/moving) and flexibility/readability (if all you have is a const char * you can just pass it and have the construction done implicitly, without cluttering up your code with an explicit std::string construction):

#include <string>
#include <iostream>
unsigned int LengthOfStringlikeObject(const std::string & s)
{
    return s.length();
}
int main(int argc, const char * argv[])
{
    unsigned int n = LengthOfStringlikeObject(argv[0]);
    std::cout << "'" << argv[0] << "' has " << n << " characters\n";
}

My aim is to write efficient cross-platform code that can handle long strings efficiently. My question is, what happens during the implicit construction? Are there any guarantees that the string will not be copied? It strikes me that, because everything is const, copying is not necessary—a thin STL wrapper around the existing pointer is all that's needed—but I'm not sure how compiler- and platform-dependent I should expect that behavior to be. Would it be safer to always explicitly write two versions of the function, one for const std::string & and one for const char *?

jez
  • 14,867
  • 5
  • 37
  • 64
  • can't you use `std::string`s always instead of `char *` ? I mean `argv[0]` has to be copied only once to a `std::string`, all the rest can be `std::string` – 463035818_is_not_an_ai May 05 '20 at 20:53
  • @idclev463035818 Let's say it's potentially a very large string and I want to avoid duplicating it in memory, even once at the beginning. – jez May 05 '20 at 20:56
  • I've written two functions (or member functions) in cases where I wanted to avoid making a copy of the const char *. But those situation were limited. I guess the answer would depend on how many functions would require this? – Anon Mail May 05 '20 at 21:07

3 Answers3

3

If you pass a const char* to something that takes a std::string, reference or not, a string will be constructed. A compiler might even complain if you send it to a reference with a warning that there is an implicit temporary object.

Now this may be optimized by the compiler and also some implementations will not allocate memory for small strings. The compiler might also internally optimize it to use a C++17 string_view. It essentially depends on what you will do to the string in your code. If you only use constant member functions, a clever compiler might optimize out.

But that is up to the implementation and outside your control. You can use explicitly std::string_view if you want to take over.

Michael Chourdakis
  • 10,345
  • 3
  • 42
  • 78
3

If you don't want copying, then string_view is what you want.

However, with this benefit comes problems. Specifically, you have to ensure that the storage that you pass lasts "long enough".

For string literals, that's no problem. For argv[0], that's almost certainly not a problem. For arbitrary sequences of characters, then you'll need to think about them.

but you can write:

unsigned int LengthOfStringlikeObject(std::string_view sv)
{
    return sv.length();
}

and call it with a string, or a const char *, and it will be fine.

Marshall Clow
  • 15,972
  • 2
  • 29
  • 45
2

It strikes me that, because everything is const, copying is not necessary—a thin STL wrapper around the existing pointer is all that's needed

I don't think this assumption is correct. Just because you have a pointer to const, it does not imply that the underlying value cannot change. It only implies that the value cannot be changed through that pointer. The pointer could be pointing to non-const storage which can change at any time.

Because of this, the library must make its own copy (to provide the "correct" string observable behavior). A quick review of libstdc++ shows that it always makes a copy. The construction from char* is not inline, so it cannot be optimized away without static linking and LTO.

While extremely trivial statically linked programs might have the copy optimized away with LTO (I wasn't able to reproduce this), I think in general it would be unlikely this optimization could be performed (especially considering the aliasing rules for char*). g++ doesn't even perform this optimization for a string literal.

jez
  • 14,867
  • 5
  • 37
  • 64
Mikel Rychliski
  • 3,455
  • 5
  • 22
  • 29
  • Right, I hadn't appreciated that. Passing a `const` parameter is saying "I need *you* not to mess with this data" but it doesn't promise "*I* will not be messing with this data". The other answers are very useful—I learned that `string_view` is the name for the behavior I was expecting, and was reminded that persistence is its critical gotcha—but this insight into the STL's point of view provided the true facepalm moment. – jez May 05 '20 at 22:35