4

Qt's QStrings can be concatenated by operator% which uses expression templates to precalculate the resulting string's size and optimize several chained calls to operator+. See this question of mine for more info.

Why hasn't std::basic_string adapted a similar construct? Is this even allowed per C++11? I see only advantages and clearly ABI compatibility can be broken by library implementors when they want to (and C++11 provided a good reason even for libstdc++).

Community
  • 1
  • 1
rubenvb
  • 74,642
  • 33
  • 187
  • 332
  • 2
    +1 for interesting question. however, i think the answer is trivial: you can implement that yourself. a typical (?) stringbuilder has amortized linear time anyway for a sequence of concatenations. – Cheers and hth. - Alf Aug 25 '12 at 11:34
  • @Cheersandhth.-Alf that answer is besides the question. The Standard provides optimized algorithms for various types of containers, and a stringstream isn't always practical in code where `operator+` makes more sense. – rubenvb Aug 25 '12 at 11:38
  • uhm, i mostly use a thingy which delegates to `+=` on a string, and supports syntax like `S() << "A = " << 6*7`. but that's mostly because i just *feel* that a stringstream would be too inefficient. i haven't even measured... – Cheers and hth. - Alf Aug 25 '12 at 11:46
  • 1
    Also note that while *efficient string concatenation* is something that would be proposed, the exact form of that might not be expression templates. For example, `concat(a,b,c,d)` to me makes more sense than `a % b % c % d` --in particular I dislike operator overloads when the domain does not have a clear meaning of what the operation is. What is the *modulo* of two strings? The characters left after removing the second string from the first one as many times as possible? – David Rodríguez - dribeas Aug 25 '12 at 18:11
  • @DavidRodríguez-dribeas +1 Perfect argument. I hate it when vector libraries redefine modulo to mean a cross product. – Christian Rau Aug 27 '12 at 11:06
  • @DavidRodríguez-dribeas My "proposal" would then be to obviously reuse `operator+` instead of using something unrelated. But as pointed out below, it wouldn't work everywhere (like with `auto`) and a different approach would be needed. – rubenvb Aug 27 '12 at 12:14

2 Answers2

5

Because nobody proposed it for the standard; unless someone proposes something, it doesn't get in. Also because it could break existing code (if they use operator+ that is).

Also, expression templates don't work well in the presence of auto. Doing something as simple as auto concat = str1 % str2; can easily be broken. Hopefully, this is an issue that C++17 will resolve via some means.

Nicol Bolas
  • 449,505
  • 63
  • 781
  • 982
  • how would `auto` be broken with expression templates? – rubenvb Aug 25 '12 at 12:23
  • 4
    @rubenvb: `auto` causes the same trouble for expression templates as for any other "transparent" wrapper: It will deduce `concat`'s type to be the same as the expression template wrapper, and not a `std::string`. – Xeo Aug 25 '12 at 12:45
  • 2
    @rubenvb: What is the type of `str1 % str2`? – David Rodríguez - dribeas Aug 25 '12 at 12:45
  • for auto issue, maybe it could be variadic template function returning std::string, example: `auto s = std::concat(someString, someStringView, ".backup")`? – Vincas Dargis Mar 08 '17 at 12:48
1

In C++11 std::basic_string supports move semantics, meaning that you can optimize the concatenation of a series of strings using operator+ by allocating memory for the first string in the series, and then simply constructs the rest of strings in the memory of the first string in the series, vastly reducing the number of memory allocations and copys necessary to concatenate and return a series of strings.

I'm sure there are further optimizations that can be done as you have pointed out with Qt's method, but the move-semantics allowed by C++11 overcomes a huge hurdle in performance that existed in the C++03 version of std::basic_string, especially when concatenating a lot of strings together.

So for instance, something like

std::string a = std::string("Blah blah") + " Blah Blah " + " Yadda, Yadda";

can be done by allocating memory for the first string, and then using move semantics, "steal" the remaining memory from the first string to construct the second-two strings in-place, and only re-allocate memory when you run out of extra space. Finally, the assignment operator can, using move-semantics "steal" the memory from the temporary r-value created on the right-hand side of the assignment operator, preventing a copy of the concatenated string.

Jason
  • 31,834
  • 7
  • 59
  • 78
  • exactly how is memory requirements calculated for the first string – Cheers and hth. - Alf Aug 25 '12 at 11:32
  • Well, there is typically some remaining space in the first string, and then for each reallocation, double the space of each preceding allocation the same way that `std::vector` does. That should reduce the number of memory allocations over time. – Jason Aug 25 '12 at 11:39
  • In this particular example, the small string optimization is likely to eat the first two parts, resulting in a single allocation for the whole string. But it is true that `std::string` isn't optimized for large number of operations on huge strings. – Bo Persson Aug 25 '12 at 11:47
  • 1
    @Jason: For large enough strings (so that small object optimization does not take place) I would expect the library to allocate the amount of memory needed. While for this particular use case you might want extra memory (say twice), in the general case (for example `std::map` you don't want each one of the strings to take twice the amount of memory that is *needed*. – David Rodríguez - dribeas Aug 25 '12 at 12:49
  • 2
    I agree that move semantics alleviate part of the problem, but the explanation + example are fishy. The 2nd `std::string(" Blah Blah ")` will needlessly allocate for example, and it's also likely that `std::string(" Blah Blah ")` would allocate just enough memory. – Matthieu M. Aug 25 '12 at 14:44
  • Yeah, this probably wasn't the greatest example ... btw, I'm now using string literals for the second and third strings ... hopefully that should make the example a bit more explicit without going through needless memory allocations from the constructors on the other two `std::string` objects I was using before. – Jason Aug 25 '12 at 21:33