0

I have a string I get from ostringstream. I'm currently trying to replace some characters in this string (content.replace(content.begin(), content.end(), "\n", "");) but sometimes I get an exception:

malloc: *** mach_vm_map(size=4294955008) failed (error code=3)
*** error: can't allocate region
*** set a breakpoint in malloc_error_break to debug
std::bad_alloc

I suspect that this happens because the string is too big. What's the best practice for these situations? Declare the string on the heap?

Update

My full method:

xml_node HTMLDocument::content() const {
  xml_node html = this->doc.first_child();
  xml_node body = html.child("body");
  xml_node section = body.child("section");
  std::ostringstream oss;
  if (section.type() != xml_node_type::node_null) {
    section.print(oss);
  } else {
    body.print(oss);
  }
  string content;
  content = oss.str();
  content.replace(content.begin(), content.end(), "<section />", "<section></section>");
  content.replace(content.begin(), content.end(), "\t", "");
  xml_node node;
  return node;
}
Baum mit Augen
  • 49,044
  • 25
  • 144
  • 182
ruipacheco
  • 15,025
  • 19
  • 82
  • 138
  • 1
    If you're looking for help with this specific problem, I think you'll need to provide a [minimal, verifiable and complete example](http://stackoverflow.com/help/mcve) – Yann Sep 29 '14 at 14:48
  • 1
    There's a decent chance that the error has nothing to do with this particular piece of code. Did you try running this with valgrind? – Sergey Kalinichenko Sep 29 '14 at 14:48
  • I can't run valgrind on OSX. – ruipacheco Sep 29 '14 at 14:51
  • Perhaps you want the boost::string methods, particularly [boost::algorithm::replace_all](http://www.boost.org/doc/libs/1_56_0/doc/html/boost/algorithm/replace_all.html) – gbjbaanb Sep 29 '14 at 15:03

4 Answers4

1

There is no std::string::replace member function's overload that accepts a pair of iterators, a const char* to be searched for and const char* to be used as replacement, and this is where your problem comes from:

content.replace(content.begin(), content.end(), "\n", "");

matches the following overload:

template <class InputIterator>
string& replace(iterator i1, iterator i2,
                InputIterator first, InputIterator last);

that is, "\n" and "" is treated as the range <first; last), which, depending on what addresses do they have, crashes your program or not.

You have to either use std::regex or implement your own logic that iterates through std::string and replaces any encountered pattern with a replacement string.

Piotr Skotnicki
  • 46,953
  • 7
  • 118
  • 160
  • he could be using the [replace method from algorithm](http://www.cplusplus.com/reference/algorithm/replace/) that does have 2 iterators, an old and new char parameter. – gbjbaanb Sep 29 '14 at 14:56
  • @gbjbaanb: no, there is no *empty character* `''`, and OP also tries to replace entire string as well – Piotr Skotnicki Sep 29 '14 at 14:57
  • ah yes - the title says replace characters but the code says remove them. So sloppy :) – gbjbaanb Sep 29 '14 at 15:00
  • 1
    For the record, I've used the solution in this question: https://stackoverflow.com/questions/3418231/replace-part-of-a-string-with-another-string – ruipacheco Sep 29 '14 at 15:11
  • There's no way `""` can match `size_t`; `size_t` is an integral type, and `""` is an array, which can convert to a pointer, but not to an integral type. – James Kanze Sep 29 '14 at 15:39
1

The lines:

content.replace(content.begin(), content.end(), "<section />", "<section></section>");
content.replace(content.begin(), content.end(), "\t", "");

result in undefined behavior. They match the function:

template<class InputIterator>
std::string& std::string::replace(
    const_iterator i1, const_iterator i2,
    InputIterator j1, InputIterator j2);

with InputIterator resolving to char const*. The problem is that the distance between the two iterators, and whether the second can be reached from the first, is undefined, since they point to totally unrelated bits of memory.

From your code, I don't think you understand what std::string::replace does. It replaces the range [i1,i2) in the string with the text defined by the range [j1,j2). It does not do any search and comparison; it is for use after you have found the range which needs replacing. Calling:

content.replace(content.begin(), content.end(), "<section />", "<section></section>");

has exactly the same effect as:

content = std::string( "<section />", "<section></section>");

, which is certainly not what you want.

In C++11, there's a regex_replace function which may be of some use, although if you're really doing this on very large strings, it may not be the most performant (the added flexibility of regular expressions comes at a price); I'd probably use something like:

std::string
searchAndReplace(
    std::string const& original,
    std::string const& from,
    std::string const& to)
{
    std::string results;
    std::string::const_iterator current = original.begin();
    std::string::const_iterator end = original.end();
    std::string::const_iterator next = std::search( current, end, from.begin(), from.end() );
    while ( next != end ) {
        results.append( current, next );
        results.append( to );
        current = next + from.size();
        next = std::search( current, end, from.begin(), from.end() );
    }
    results.append( current, next );
    return results;
}

For very large strings, some heuristic for guessing the size, and then doing a reserve on results is probably a good idea as well.

Finally, since your second line just removes '\t', you'd be better off using std::remove:

content.erase( std::remove( content.begin(), content.end(), '\t' ), content.end() );
James Kanze
  • 150,581
  • 18
  • 184
  • 329
0

AFAIK stl strings are always allocated on the heap if they go over a certain (small) size, eg 32 chars in Visual Studio

What you can do if you get allocation exceptions:

  • Use a custom allocator
  • Use a "rope" class.

Bad alloc might not mean you're run out of memory, more likely that you're run out of contiguous memory. A rope class might be better suited to you as it allocated strings in pieces internally.

gbjbaanb
  • 51,617
  • 12
  • 104
  • 148
0

This is one of the correct (and reasonably efficient) ways to remove characters from a string if you want to make a copy and leave the original intact:

#include <algorithm>
#include <string>

std::string delete_char(std::string src, char to_remove)
{
    // note: src is a copy so we can mutate it

    // move all offending characters to the end and get the iterator to last good char + 1
    auto begin_junk = std::remove_if(src.begin(),
                                     src.end(),
                                     [&to_remove](const char c) { return c == to_remove; });
    // chop off all the characters we wanted to remove
    src.erase(begin_junk,
              src.end());

    // move the string back to the caller's result
    return std::move(src);
}

called like this:

std::string src("a\nb\bc");
auto dest = delete_char(src, '\n');
assert(dest == "abc");

If you'd prefer to modify the string in place then simply:

src.erase(std::remove_if(src.begin(), src.end(), [](char c) { return c == '\n'; }), src.end());
Richard Hodges
  • 68,278
  • 7
  • 90
  • 142
  • Why `std::remove_if`, and not simply `std::remove`? – James Kanze Sep 29 '14 at 16:06
  • why not indeed? There are many correct ways to skin a c++ cat. – Richard Hodges Sep 29 '14 at 16:10
  • But the simplest is usually best. Introducing a lambda here when there is a function which already does exactly what is wanted is unnecessary complication. – James Kanze Sep 29 '14 at 16:13
  • I think that's a fair comment. Nevertheless, the hope is that the OP learns the general principle of remove/erase in stl and then goes on to lead a happier life. :-) – Richard Hodges Sep 29 '14 at 16:15
  • Yes. That's a bit why I pointed out the algorithm using `std::search`. It's a bit more complicated than using `std::regex_replace`, but it introduces a useful pattern. – James Kanze Sep 29 '14 at 16:33