6

I've got a project in which I need to read/write large files.

I've decided to use ifstream::read() to put those files into memory in one single pass, into an std::string. (that seems to be the fastest way to do it in c++ : http://insanecoding.blogspot.com/2011/11/how-to-read-in-file-in-c.html and http://insanecoding.blogspot.com/2011/11/reading-in-entire-file-at-once-in-c.html)

When switching between files, I then need to "reset" the std::string used as the previous memory buffer (ie, erase the char[] buffer to free memory)

I tried :

std::string::clear()
std::string::assign("")
std::string::erase(0, std::string::npos)
std::string::resize(0)
std::string::reserve(0)

but, under Visual Studio 2008, this doesn't free the memory used inside the std::string itself : its underlying buffer isn't de-allocated.

The only way I found to delete it is to call std::string::swap(std::string("")) to force changing the internal buffers between the actual std::string and the empty one in param.

I find this behaviour a bit strange...

I only tested on Visual Studio 2008, I don't know if it's a STL-standard behaviour or if it's MSVC-specific.

Could you get me some clue ?

genpfault
  • 51,148
  • 11
  • 85
  • 139
adrien.pain
  • 443
  • 2
  • 11
  • 18
  • 5
    Swapping is a standard way of making containers release reserved memory. And reading file using `std::string` is way off from the optimal way. –  Dec 05 '11 at 14:22
  • 1
    @VladLazarenko: standard, and possibly fastest. – Nawaz Dec 05 '11 at 14:23
  • 3
    Why **would** you expect anyone to deallocate the buffer? C++11 adds the explicit `shrink_to_fit()` to make a non-binding request for deallocation. – Kerrek SB Dec 05 '11 at 14:23
  • @Kerrek SB: thank you, I didn't try C++11 yet, as Visual 2005/2008 are the only one allowed compilers in my company :/ – adrien.pain Dec 05 '11 at 14:35
  • @Vlad Lazarenko: which ways are fastest to read large files ? memory map ? because i parsed the file in a single pass using ifstream.tellg() to reserve a large-enough buffer in my std::string and ifstream.read() to put all the file into memory. I checked the implementation of ifstream::read() in Visual Studio 2008 and it doesn't use any internal buffer (it directly put data in the buffer passed as argument) So I don't really see a fastest way to do that in c++. – adrien.pain Dec 05 '11 at 14:37
  • 1
    ¤ Swapping is the idiomatic way to shrink the *capacity* in C++98/C++03. In C++11 you have a method **`shrink_to_fit`** that by C++11 §21.4.4/14 "is a non-binding request to reduce capacity() to size(). [Note: The request is non-binding to allow latitude for implementation-specific optimizations. —end note ]". Cheers & hth., – Cheers and hth. - Alf Dec 05 '11 at 14:51
  • See answer to this question: [Copy data from fstream to stringstream with no buffer?](http://stackoverflow.com/questions/4064601/copy-data-from-fstream-to-stringstream-with-no-buffer) – Peter Wood Dec 07 '11 at 08:41

1 Answers1

4

As Vlad and Alf commented, std::string().swap(the_string) is the C++98 way to release the_string's capacity, and the_string.shrink_to_fit() is the C++11 way.

As to why clear(), erase(), resize(), etc. don't do it, this is an optimization to reduce allocations when you use a string over and over. If clear() freed the string's capacity, you'd generally have to reallocate a similar amount of space on the next iteration, which would take some time the implementation can save by keeping the capacity around. This implementation isn't guaranteed by the standard, but it's very common in implementations.

reserve() is documented with

Calling reserve() with a res_arg argument less than capacity() is in effect a non-binding shrink request. A call with res_arg <= size() is in effect a non-binding shrink-to-fit request.

which implies that implementations are more likely to release the capacity on a reserve() call. If I'm reading them right, libc++ and libstdc++ do release space when you call reserve(0), but it's plausible for VC++'s library to have made the opposite choice.

Edit: As penelope says, std::string's behavior here tends to be exactly the same as std::vector's behavior.

Jeffrey Yasskin
  • 5,171
  • 2
  • 27
  • 39
  • 1
    I'd just like to add... `string`s behave mostly like `vector`s... and if you add data to a `std::vector`, when it's size reaches the reserved capacity, it's capacity doubles (and nothing doesn't (have to) happen when the size diminishes). This way, the insertion time in the back of the vector is more-or-less constant while still being efficient memory-wise: the memory reservation (slow operation) is done exponentially less frequent over time, while the size of the vector is never more than twice as big than the needed size – penelope Dec 27 '11 at 10:19