6

Profiling of my application reveals that it is spending nearly 5% of CPU time in string allocation. In many, many places I am making C++ std::string objects from a 64MB char buffer. The thing is, the buffer never changes during the running of the program. My analysis of std::string(const char *buf,size_t buflen) calls is that that the string is being copied because the buffer might change after the string is made. That isn't the problem here. Is there a way around this problem?

EDIT: I am working with binary data, so I can't just pass around char *s. Besides, then I would have a substantial overhead from always scanning for the NULL, which the std::string avoids.

vy32
  • 28,461
  • 37
  • 122
  • 246
  • 1
    Do you have a single non-mutable buffer? Can you use a singleton for the std::string object? – Chad Oct 09 '12 at 15:32

5 Answers5

8

If the string isn't going to change and if its lifetime is guaranteed to be longer than you are going to use the string, then don't use std::string.

Instead, consider a simple C string wrapper, like the proposed string_ref<T>.

James McNellis
  • 348,265
  • 75
  • 913
  • 977
  • I am using strings that contain NULLs, so I can't do this. – vy32 Oct 08 '12 at 19:17
  • 2
    @vy32: A `string_ref` can handle a string containing null characters (it has a constructor that takes a pointer to the initial element and the length). I don't know whether the LLVM or Chromium implementations linked from that proposal include that member (I have not looked at their source), but if not it would be trivial to add such a member. – James McNellis Oct 08 '12 at 19:21
  • OP can also use `boost::range` for this. – Mooing Duck Oct 08 '12 at 19:37
  • The in-place `substr` on `string_ref` requires that the length be stored explicitly, not inferred by searching for a terminator. – Ben Voigt Oct 08 '12 at 20:49
5

Binary data? Stop using std::string and use std::vector<char>. But that won't fix your issue of it being copied. From your description, if this huge 64MB buffer will never change, you truly shouldn't be using std::string or std::vector<char>, either one isn't a good idea. You really ought to be passing around a const char* pointer (const uint8_t* would be more descriptive of binary data but under the covers it's the same thing, neglecting sign issues). Pass around both the pointer and a size_t length of it, or pass the pointer with another 'end' pointer. If you don't like passing around separate discrete variables (a pointer and the buffer’s length), make a struct to describe the buffer & have everyone use those instead:

struct binbuf_desc {
    uint8_t* addr;
    size_t len;
    binbuf_desc(addr,len) : addr(addr), len(len) {}
}

You can always refer to your 64MB buffer (or any other buffer of any size) by using binbuf_desc objects. Note that binbuf_desc objects don’t own the buffer (or a copy of it), they’re just a descriptor of it, so you can just pass those around everywhere without having to worry about binbuf_desc’s making unnecessary copies of the buffer.

phonetagger
  • 7,701
  • 3
  • 31
  • 55
  • Well, I understand why you think we should use std::vector, however there are a lot of string built-ins that are useful from time to time. We do have a zero-copy class that does things properly, but we have been trying to avoid putting all of the string methods into it. – vy32 Oct 09 '12 at 15:32
  • 1
    What string built-ins do you find useful? – phonetagger Oct 09 '12 at 15:33
  • So this is what I ended up doing and it worked quite well! – vy32 Oct 08 '18 at 01:46
3

There is no portable solution. If you tell us what toolchain you're using, someone might know a trick specific to your library implementation. But for the most part, the std::string destructor (and assignment operator) is going to free the string content, and you can't free a string literal. (It's not impossible to have exceptions to this, and in fact the small string optimization is a common case that skips deallocation, but these are implementation details.)

A better approach is to not use std::string when you don't need/want dynamic allocation. const char* still works just fine in modern C++.

Ben Voigt
  • 277,958
  • 43
  • 419
  • 720
  • I'm using `g++` with the gnu stdlib++. But `std::string` implements shared strings and copy-on-write, so I think that you are mistaken about the destructor being called. And `const char *` has many, many problems, as you will discover if you start handling binary data – vy32 Oct 08 '12 at 19:37
  • 1
    @vy32: `const char*` works just fine with binary data. Just pass the length, instead of using `strlen`. Also, I don't believe that shared strings with COW are permitted by the C++ standard. – Ben Voigt Oct 08 '12 at 20:48
  • Specifically, copy-on-write violates the iterator validity requirements (some writes to strings are not permitted to invalidate iterators). So I am quite sure that stdlib++ does not implement copy-on-write for `std::string`. – Ben Voigt Oct 08 '12 at 20:53
3

Since C++17, std::string_view may be your way. It can be initialized both from a bare C string (with or without a length), or a std::string

There is no constraint that the data() method returns a zero-terminated string though.

If you need this "zero-terminated on request" behaviour, there are alternatives such as str_view from Adam Sawicki that looks satisfying (https://github.com/sawickiap/str_view)

armel
  • 2,497
  • 1
  • 24
  • 30
  • 1
    or if you're using C++20 a [`span`](https://en.cppreference.com/w/cpp/container/span) might be more appropriate – Sam Mason May 05 '22 at 10:06
  • @SamMason indeed, it is then a matter whether the API to be called absolutely want a zero-char at the end of the string. Hopefully, the need for this is lesser and lesser and `span` is enough for most cases :-) by the way, before C++20, gslspan does the job. – armel May 10 '22 at 09:00
  • for those with ideological opposition against microsoft: the GSL being referred to above likely implies using a [microsoft project](https://github.com/microsoft/GSL), and isn't a reference to the GNU Scientific Library – Sam Mason May 10 '22 at 12:46
1

Seems that using const char * instead of std::string is the best way to go for you. But you should also consider how you are using strings. It may be possible that there could be going on implicit conversion from char pointers to std::string objects. This could happen during function calls, for example.

Mooing Duck
  • 64,318
  • 19
  • 100
  • 158
shargors
  • 2,147
  • 1
  • 15
  • 21