57

In all examples that use some kind of buffering I see they use stream instead of string. How is std::ostringstream and << operator different than using string.append. Which one is faster and which one uses less resourses (memory).

One difference I know is that you can output different types into output stream (like integer) rather than the limited types that string::append accepts.

Here is an example:

std::ostringstream os;
os << "Content-Type: " << contentType << ";charset=" << charset << "\r\n";
std::string header = os.str();

vs

std::string header("Content-Type: ");
header.append(contentType);
header.append(";charset=");
header.append(charset);
header.append("\r\n");

Obviously using stream is shorter, but I think append returns reference to the string so it can be written like this:

std::string header("Content-Type: ");
header.append(contentType)
  .append(";charset=")
  .append(charset)
  .append("\r\n");

And with output stream you can do:

std::string content;
...
os << "Content-Length: " << content.length() << "\r\n";

But what about memory usage and speed? Especially when used in a big loop.

Update:

To be more clear the question is: Which one should I use and why? Is there situations when one is preferred or the other? For performance and memory ... well I think benchmark is the only way since every implementation could be different.

Update 2:

Well I don't get clear idea what should I use from the answers which means that any of them will do the job, plus vector. Cubbi did nice benchmark with the addition of Dietmar Kühl that the biggest difference is construction of those objects. If you are looking for an answer you should check that too. I'll wait a bit more for other answers (look previous update) and if I don't get one I think I'll accept Tolga's answer because his suggestion to use vector is already done before which means vector should be less resource hungry.

NickSoft
  • 3,215
  • 5
  • 27
  • 48
  • Offtopic: You should also look for a fast function to convert integer to string/char. sprintf/itoa is does not perform well to do simple integer to decimal string conversion for Content-Length. – Etherealone Nov 07 '13 at 20:20
  • sprintf could be slow because of formatting options, but why do you think itoa is slow? – NickSoft Nov 08 '13 at 05:29
  • I shouldn't have written itoa there. I meant itoa should not be an option because it is non-standard. But I remember comparing it to these: https://gist.github.com/anonymous/7179097 – Etherealone Nov 08 '13 at 15:07

4 Answers4

40

constructing a stream object is a significantly more complex operation than constructing a string object, because it has to hold (and, therefore, construct) its std::locale member, among other things needed to maintain state (but the locale is by a large margin the heaviest).

Appending is similar: both maintain a contiguous array of characters, both allocate more when the capacity is exceeded. The only differences I can think of is that when appending to a stream, there is one virtual member function call per overflow (in addition to memory allocation/copying, which dominates overflow handling anyway), and operator<< has to do some extra checks of the stream state.

Also, note that you're calling str(), which copies the entire string one more time, so based on what your code is written to do, the stream example does more and should be slower.

Let's test:

#include <sstream>
#include <string>
#include <numeric>

volatile unsigned int sink;
std::string contentType(50, ' ');
std::string charset(50, ' ');
int main()
{
 for(long n = 0; n < 10000000; ++n)
 {
#ifdef TEST_STREAM    
    std::ostringstream os;
    os << "Content-Type: " << contentType << ";charset=" << charset << "\r\n";
    std::string header = os.str();
#endif
#ifdef TEST_STRING
    std::string header("Content-Type: ");
    header.append(contentType);
    header.append(";charset=");
    header.append(charset);
    header.append("\r\n");
#endif
    sink += std::accumulate(header.begin(), header.end(), 0);
 }
}

that's 10 million repetitions

On my Linux, I get

                   stream         string
g++ 4.8          7.9 seconds      4.4 seconds
clang++/libc++  11.3 seconds      3.3 seconds

so, for this use case, in these two implementations, strings appear to work faster, but obviously both ways have a lot to improve (reserve() the string, move stream construction out of the loop, use a stream that doesn't require copying to access its buffer, etc)

Cubbi
  • 46,567
  • 13
  • 103
  • 169
  • You are forgetting handling something like `std::ios_base::width` – Slava Nov 07 '13 at 20:07
  • @Slava edited in an honorable mention as extra payload for stream construction: string's operator<< doesn't have anything special to do when width is zero. – Cubbi Nov 07 '13 at 20:16
  • 11
    Changing the setup slightly to construct the stream outside the loop and merely resetting it (`os.str("")`) changes the numbers in interesting ways: the stream is now faster on gcc but slower on clang. I get gcc/string=4.5s, gcc/stream=2.5s, clang/string=2.25s, clang/stream=4.1s: nicely crossed over ;) – Dietmar Kühl Nov 07 '13 at 20:25
  • 2
    so unless you are constructing the stream every time it's actually comparable to using string. – NickSoft Nov 08 '13 at 05:24
18

std::ostringstream is not necessarily stored as a sequential array of characters in memory. You would actually need to have continuous array of characters while sending those HTTP headers and that might copy/modify the internal buffer to make it sequential.

std::string using appropriate std::string::reserve has no reason to act slower than std::ostringstream in this situation.

However, std::ostringstream is probably faster for appending if you absolutely have no idea about the size you have to reserve. If you use std::string and your string grows, it eventually requires reallocation and copying of whole buffer. It would be better to use one std::ostringstream::str() to make the data sequential at once compared to multiple re-allocations that would happen otherwise.

P.S. Pre-C++11 std::string is not required to be sequential either, whilst almost all libraries implement it as sequential. You could risk it or use std::vector<char> instead. You would need to use the following to do appending:

char str[] = ";charset=";
vector.insert(vector.end(), str, str + sizeof(str) - 1);

std::vector<char> would be best for performance because it is most probably cheaper to construct, but it is probably not of importance compared to std::string and the actual time they take to construct. I have done something similar to what you are trying and went with std::vector<char> before. Purely because of logical reasons; vector seemed to fit the job better. You do not actually want string manipulations or such. Also, benchmarks I did later proved it to perform better or maybe it was only because I did not implement operations well enough with std::string.

While choosing, the container that has requirements for your needs and minimal extra features usually does the job best.

Etherealone
  • 3,488
  • 2
  • 37
  • 56
  • 3
    growing buffers on the fly is counterintuitively cheap. – jthill Nov 07 '13 at 19:48
  • 1
    *Using appropriate reserve* I agree, otherwise it implies in continuous reallocation of memory and therefore lower performance. And despite the fact `ostringstream` doesn't store it sequentially (for performance reasons) doesn't mean you cannot fetch it in a continuous buffer with `str().c_str()`. – Havenard Nov 07 '13 at 19:51
  • stream buffers are sequential in memory, their entire non-virtual interface (sgetc/sputc/etc), relies on it, since it works through pointers. – Cubbi Nov 07 '13 at 19:53
  • @Tolga I don't quite understood why I have to bother how stream is stored - sequentially or not. When I need it I can always fetch sequental data as Havenard said uing .str().c_str() or .str().data() combined with .str().length() or size(). The same is valid for std::string. Regardless of implementation you get sequential memory using c_str() or data(). – NickSoft Nov 07 '13 at 19:58
  • @NickSoft it requires an extra operation to make it sequential while you can already have it sequential without any operation, if you need absolute performance. He is probably trying to write a high performance web server and these string operations are usually where the bottleneck is since it is mostly all the web server does. – Etherealone Nov 07 '13 at 19:59
  • @Havenard You are right, I have edited my answer to make the relation between reallocation and serialization more clear. – Etherealone Nov 07 '13 at 20:02
  • Now you mentioned `std::vector`, [I have seen this being used before](https://github.com/TrinityCore/TrinityCore/tree/master/src/server/shared/Packets) to implement protocols, I only don't know if they do that because of performance or because it can contain null bytes, ignore charsets etc. This stuff can be important when building buffers that must be binary safe. – Havenard Nov 07 '13 at 20:08
  • @Tolga Yes I know it's extra operation, but it's one operation. Growing buffer on string::append() copies data on every append/graw unless memory is preallocated, but you don't know the final size to use string::reserve(). It still seams strange to me to use vector. Is it really a good option and how do you convert the vector to string when you need to send it? – NickSoft Nov 08 '13 at 05:18
  • @Havenard string or streams have no problems with null bytes, but this is one more example usage of vector. I wander why they bothered to make up stream when people would use vector instead. – NickSoft Nov 08 '13 at 05:19
  • @NickSoft You would just send the buffer using vector.data(). I explicitly said he should now the size to reserve. This is a http server/client, the headers won't get bigger than a certain size for 90% of requests, he does not need to know the exact size to reserve. – Etherealone Nov 08 '13 at 10:09
  • vector::data() is C++11 according to cplusplus.com. My project is C++98. If there is no way to get the raw data from vector other than char by char with vector::at() then it can't be used as buffer (efficiently). – NickSoft Nov 08 '13 at 10:47
  • 1
    @NickSoft Since vector is sequential, you can access its buffer by accessing its first element: `char const* data = &vector[0];` – Etherealone Nov 08 '13 at 14:53
  • 1
    @NickSoft I don't think any headers will exceed 2KB which seems to be a good value for reserve. Even if you have 1 million clients connected it would only use 2GB of RAM which will be nothing compared to what your database server will need to perform decently with that amount of traffic assuming only 5~10% of actual users will be on simultaneously (of course this is a raw assumption and database operations may not exist at all if not very cheap). You can even log statistics and do calculations once a day to find the right size to reserve dynamically. – Etherealone Nov 08 '13 at 15:24
  • @Tolga by the description in cplusplus.com I can only assume they vector is sequental "outside". They say nothing about how it MUST be implemented internally. `Individual elements are accessed by their position in this sequence` - position in sequence, i.e. index. No one talks about memory address. I need a bit more to assume sequental memory. Can you give me a quote of well known source? – NickSoft Nov 08 '13 at 15:46
  • @Tolga Yes, I will pre-allocate headers memory. I can spend as much as I want since I'm replacing php which uses minimum memory of tens of MB. 2kb is nothing compared to that. I'll probably pre-allocate memory for the web document too. I just want to figure out what buffer implementation to use. – NickSoft Nov 08 '13 at 15:49
  • 3
    @NickSoft http://en.cppreference.com/w/cpp/container/vector : The elements are stored contiguously, which means that elements can be accessed not only through iterators, but also using offsets on regular pointers to elements. This means that a pointer to an element of a vector may be passed to any function that expects a pointer to an element of an array. (By the way I suggest using cppreference.com to lookup things) – Etherealone Nov 08 '13 at 18:21
  • Well that is written pretty clearly. I use google to search for reference because I'm lazy. The description at cppreference.com is way better (at least about vector). – NickSoft Nov 08 '13 at 20:25
  • @jthill, could you explain why is cheap? Any references or examples? – Vassilis May 20 '20 at 10:36
  • @Vassilis say you double the buffer size and copy the existing contents every time it winds up too small. Worst case, assuming you started with a one byte buffer, on average every element has been copied twice: all once, half another, a quarter another an eighth another, 1.111111 binary is two. – jthill May 20 '20 at 15:51
1

With stream you can have your class Myclass override the << operation so that you can write

MyClass x;
ostringstream y;
y << x;

For append you need to have a ToString method (or something similar) since you can't override the append function of string.

For some code pieces use whatever you feel more comfortable with. Use stream for bigger projects where it's useful to be able to simply stream an object.

Sorin
  • 11,863
  • 22
  • 26
  • but as Dieter Lücking pointed out you could use + to append strings. You can easily override + operator. – NickSoft Nov 07 '13 at 20:01
  • 1
    True, but not the append function. If you override the + operator you can run into trouble for not overriding all orders, or when the compiler decides to evaluate some other operation first. I'd recommend against overriding + operator, unless your class is some scalar or vector value. – Sorin Nov 07 '13 at 20:06
0

If you concern about speed you should profile and/or test. In theory std::string::append should be not slower as it is simpler (stream has to deal with locale, different formatting and be more generic). But how faster one solution or another really is you can realize only by testing.

Slava
  • 43,454
  • 1
  • 47
  • 90