-1

I have written a small test where I'm trying to compare the run speed of resizing a container and then subsequently using std::generate_n to fill it up. I'm comparing std::string and std::vector<char>. Here is the program:

#include <algorithm>
#include <iostream>
#include <iterator>
#include <random>
#include <vector>

int main()
{
    std::random_device rd;
    std::default_random_engine rde(rd());
    std::uniform_int_distribution<int> uid(0, 25);

    #define N 100000

#ifdef STRING
    std::cout << "String.\n";
    std::string s;
    s.resize(N);
    std::generate_n(s.begin(), N, 
                    [&]() { return (char)(uid(rde) + 65); });
#endif

#ifdef VECTOR
    std::cout << "Vector.\n";
    std::vector<char> v;
    v.resize(N);
    std::generate_n(v.begin(), N, 
                    [&]() { return (char)(uid(rde) + 65); });
#endif

    return 0;
}

And my Makefile:

test_string:
    g++ -std=c++11 -O3 -Wall -Wextra -pedantic -pthread -o test test.cpp -DSTRING
    valgrind --tool=callgrind --log-file="test_output" ./test
    cat test_output | grep "refs"

test_vector:
    g++ -std=c++11 -O3 -Wall -Wextra -pedantic -pthread -o test test.cpp -DVECTOR
    valgrind --tool=callgrind --log-file="test_output" ./test
    cat test_output | grep "refs"

And the comparisons for certain values of N:

N=10000
String: 1,865,367
Vector: 1,860,906

N=100000
String: 5,295,213
Vector: 5,290,757

N=1000000
String: 39,593,564
Vector: 39,589,108

std::vector<char> comes out ahead everytime. Since it seems to be more performant, what is even the point of using std::string?

user3992511
  • 63
  • 1
  • 2
  • 11
    Those differences are tiny enough (the first is around 0.2%, the last 0.01%) to require that you re-run this many times and take a mean. – juanchopanza Aug 30 '14 at 10:05
  • 10
    It also seems that the difference doesn't scale with `N`, implying this is a fixed overhead in your program, rather than a systematic problem. – Oliver Charlesworth Aug 30 '14 at 10:05
  • 2
    This data does not provide any meaningful proof that `std::string` is, in fact, slower than `std::vector`. For one thing, you only have data for one implementation with one set of flags. – Puppy Aug 30 '14 at 10:18
  • 2
    The *point* of `string` is not optimisation. The point is the interface for string operations. While some string implementations may have optimisations for typical uses of strings, I don't think your test is a very typical use case. Also, you are just measuring number memory accesses. That does not necessarily match with time because of pipelining and cache (though it probably does). – eerorika Aug 30 '14 at 10:23
  • 1
    I used this: http://coliru.stacked-crooked.com/a/0c29bea5e7b43862 and got 39021ms (String) versus 39039ms (Vector) on my i7/32GiB box. That's **0.05% faster for String** at N = 1,000,000,000 – sehe Aug 30 '14 at 10:30
  • Why aren't you profiling a useful metric, i.e. *runtime*? – Oliver Charlesworth Aug 30 '14 at 10:37
  • 3
    Because it's based on an invalid premise. – Oliver Charlesworth Aug 30 '14 at 11:27
  • 1
    Why the upvotes? That is the question. – juanchopanza Aug 30 '14 at 14:54
  • I don't think one should "punish" someone by downvoting a question which merely criticises common wisdom. The OP's logic is wrong, and I hope he now knows why, but he actually tried to measure performance (which happens all too rarely), asked here for verification, explained his point very clearly and with small compilable code, even with a Makefile. This investigative attitude easily earns him an upvote IMO. – Christian Hackl Aug 31 '14 at 14:49

2 Answers2

7

I used #define N 100000000. Tested 3 times for each scenario and in all scenarios string is faster. Not using Valgrind, it does not make sense.

OS: Ubuntu 14.04. Arch:x86_64 CPU: Intel(R) Core(TM) i5-4670 CPU @ 3.40GHz.

$COMPILER -std=c++11 -O3 -Wall -Wextra -pedantic -pthread -o test x.cc -DVECTOR    
$COMPILER -std=c++11 -O3 -Wall -Wextra -pedantic -pthread -o test x.cc -DSTRING

Times:

compiler/variant           | time(1) | time(2) | time(3)
---------------------------+---------+---------+--------
g++ 4.8.2/vector    Times: | 1.724s  | 1.704s  | 1.669s
g++ 4.8.2/string    Times: | 1.675s  | 1.678s  | 1.674s
clang++ 3.5/vector  Times: | 1.929s  | 1.934s  | 1.905s
clang++ 3.5/string  Times: | 1.616s  | 1.612s  | 1.619s
arhuaco
  • 1,718
  • 16
  • 19
  • 2
    This is, at best, a comment. – Puppy Aug 30 '14 at 10:22
  • 1
    Well, I think it is useful as an answer. First, I don't know why user3992511 cares about times with valgrind, it does not make sense. Then, I show him that by using better numbers string is indeed faster with two compilers (at lest with the set of flags he used). Why should my answer be downvoted? Of course, it is not my choice. Everybody can vote as they wish. – arhuaco Aug 30 '14 at 10:25
  • It is a comment. Just make it [concise and readable](http://stackoverflow.com/questions/25581207/why-is-stdvectorchar-faster-than-stdstring#comment39954791_25581207) – sehe Aug 30 '14 at 10:31
  • 12
    It's an answer. The question is "Why is vector faster" and this answer shows that vector is not faster (universally at least). The question about valgrind within the asnwer should be a comment though. – eerorika Aug 30 '14 at 10:34
  • @user2079303 That just invalidates the question. By the way, I find the answer completely unreadable. If there's a point, just make it. If you need data, provide a matrix or a graph. But it's much easier to say it's all fluff and irrelevant. Just a large mess of numbers intermixed with foggy shell debris does not quality information make. – sehe Aug 30 '14 at 10:36
  • 1
    @sehe Should an invalid question not be answered at all? Sure it's a waste of one's own time, but why not? Is there a reason to close this question instead? – eerorika Aug 30 '14 at 10:42
  • No matter what happens, I removed a lot of noise from the answer. @sehe: What is the main reason for my "answer" not being a valid answer but a comment? I want to know for the future. – arhuaco Aug 30 '14 at 10:48
  • Well, the question is not answerable. You raise a valid point, but they don't answer the question, indeed they just invalidate the question. This is what the comments or close votes are for usually. (Thanks for the formatting improvements. It really makes a huge difference) – sehe Aug 30 '14 at 10:58
  • @Puppy it's too big to fit in a comment – M.M Aug 30 '14 at 11:46
  • You need to run more than three times with gcc. The differences are not large enough to draw any conclusions from 3 trials. – juanchopanza Aug 30 '14 at 14:20
  • 1
    @juanchopanza At least the difference with clang++ is big. That's why I did not bother. – arhuaco Aug 30 '14 at 18:10
5

std::vector comes out ahead everytime. Since it seems to be more performant, what is even the point of using std::string?

Even if we suppose that your observation holds true for a wide range of different systems and different application contexts, it would still make sense to use std::string for various reasons, which are all rooted in the fact that a string has different semantics than a vector. A string is a piece of text (at least simple, non-internationalised English text), a vector is a collection of characters.

Two things come to mind:

  • Ease of use. std::string can be constructed from string literals, has a lot of convenient operators and can be subject to string-specific algorithms. Try std::string x = "foo" + ("bar" + boost::algorithm::replace_all_copy(f(), "abc", "ABC").substr(0, 10) with a std::vector<char>...

  • std::string is implemented with Small-String Optimization (SSO) in MSVC, eliminating heap allocation entirely in many cases. SSO is based on the observation that strings are often very short, which certainly cannot be said about vectors.

Try the following:

#include <iostream>
#include <vector>
#include <string>

int main()
{
    char const array[] = "short string";

#ifdef STRING
    std::cout << "String.\n";
    for (int i = 0; i < 10000000; ++i) {
        std::string s = array;
    }
#endif

#ifdef VECTOR
    std::cout << "Vector.\n";
    for (int i = 0; i < 10000000; ++i) {
        std::vector<char> v(std::begin(array), std::end(array));
    }
#endif
}

The std::string version should outperform the std::vector version, at least with MSVC. The difference is about 2-3 seconds on my machine. For longer strings, the results should be different.

Of course, this does not really prove anything either, except two things:

  • Performance tests depend a lot on the environment.
  • Performance tests should test what will realistically be done in a real program. In the case of strings, your program may deal with many small strings rather than a single huge one, so test small strings.
Christian Hackl
  • 27,051
  • 3
  • 32
  • 62
  • You should not define STRING by default to avoid duplicate work on test. OS: Ubuntu 14.04. Arch:x86_64 CPU: Intel(R) Core(TM) i5-4670 CPU @ 3.40GHz. g++ 4.8.2/vector. Times:0.218s,0.206s,.205s. g++ 4.8.2/string. Times: 0.320s, 0.324s, 0.318s. clang++ 3.5/vector. Times: 0.218s, 0.210s,0.210s. clang++ 3.5/string. Times: 0.320s, 0.324s, 0.318s. So vector is indeed faster with your example that uses small strings. Interesting. – arhuaco Aug 30 '14 at 12:54
  • @arhuaco: I did the comparison with `VECTOR` defined. Just happened to post the code like this. Guess I'll remove the `#define` for clarity. Does clang even use SSO? – Christian Hackl Aug 30 '14 at 16:30