11

This question risks being a duplicate e.g. remove double quotes from a string in c++ but none of the asnwers that I saw addresses my question
I have a list of strings, some of which are double quoted and some aren't, Quotes are always at beginning and end

std::vector<std::string> words = boost::assign::list_of("words")( "\"some\"")( "of which")( "\"might\"")("be quoted");

I am looking for the most efficient way to remove the quotes. Here is my attempt

for(std::vector<std::string>::iterator pos = words.begin(); pos != words.end(); ++pos)
{
  boost::algorithm::replace_first(*pos, "\"", "");
  boost::algorithm::replace_last(*pos, "\"", "");
  cout << *pos << endl;
}

Can I do better than this? I have potentially hundreds of thousands of string to process.They may come from a file or from a database. The std::vector in the example is just for illustration purposes.

Community
  • 1
  • 1
molita
  • 135
  • 2
  • 2
  • 7
  • Sounds like you're better off not having the quotes in the data in the first place? – Lightness Races in Orbit Sep 11 '11 at 14:36
  • 2
    @Tomalak if only we could choose what data we have to work with :) – Seth Carnegie Sep 11 '11 at 14:38
  • @Seth: Sometimes, we can. Sometimes, we _own_ the data source and are trying to fix the wrong thing. – Lightness Races in Orbit Sep 11 '11 at 14:39
  • @Potatoswatter The string have variable lengths. Some very short some very long – molita Sep 11 '11 at 14:41
  • How long is very long? 200 bytes? A kilobyte? A megabyte? – Potatoswatter Sep 11 '11 at 14:43
  • 1
    @Tomalak I doubt he'd impose weird restrictions and requirements on himself if he had a choice. – Seth Carnegie Sep 11 '11 at 14:44
  • If you want to get rid of all double-quotes, use [boost::algorithm::erase_all](http://www.boost.org/doc/libs/1_47_0/doc/html/boost/algorithm/erase_all.html). The question is not clear; if quotes can be only the very first and last character, then I'd just test `(*pos)[0]` and `(*pos)[pos->size()]` by hand. Can it happen that the double-quotes are unbalanced? Do you need to handle that as well? Unless you want to process hundreds of thousands of strings *every second*, I would not bother much with what will gain you a few msec. – eudoxos Sep 11 '11 at 14:45
  • @Potatoswatter A string can be up to up to 128 bytes. – molita Sep 11 '11 at 14:53
  • https://en.cppreference.com/w/cpp/io/manip/quoted – Marek R Jun 06 '22 at 10:24

4 Answers4

26

If you know the quotes will always appear in the first and last positions, you can do simply

if ( s.front() == '"' ) {
    s.erase( 0, 1 ); // erase the first character
    s.erase( s.size() - 1 ); // erase the last character
}

The complexity is still linear in the size of the string. You cannot insert or remove from the beginning of a std::string in O(1) time. If it is acceptable to replace the character with a space, then do that.

Potatoswatter
  • 134,909
  • 25
  • 265
  • 421
  • Some are quoted and some are not. – Seth Carnegie Sep 11 '11 at 14:44
  • @Seth: That's what the "if" does. We don't really have a problem description here, anyway. What about escaped quotes within the string? Who knows? – Potatoswatter Sep 11 '11 at 14:48
  • oops, my eyes skipped over the `if` expecting it to be the `for` that iterates the vector, sorry. Happens all the time to me. And yeah, the problem is poorly specified. – Seth Carnegie Sep 11 '11 at 14:49
  • @Potatoswatter This looks elegant and aimple the problem is that I may erase a non quote at the end if it happens that a string has quotes only at the beginning ( case of bad data ) – molita Sep 11 '11 at 14:58
  • @molita if data isn't allowed to have a quote only at the beginning or only at the end and it _does_, then you can't assure people using the program that it will produce correct results. Garbage in, garbage out. – Seth Carnegie Sep 11 '11 at 15:07
  • 2
    I consider this answer better than Seth's because it's simpler and costs less allocation. Specifically, Seth's will always allocate a new string from the substr call, while the `s.erase(0,1)` in this one will copy the characters down in-place. The `s.erase(s.size()-1)` would be a bit more readable as `s.pop_back()`, and is roughly two assignments, so probably negligible. – Jeffrey Yasskin Dec 28 '11 at 07:32
  • @JeffreyYasskin The only reason I didn't use `pop_back` is that it is only new with C++11. – Potatoswatter Dec 28 '11 at 08:35
  • Whoops, my mistake. Sorry about that. – Jeffrey Yasskin Dec 28 '11 at 18:09
5

It would probably be fast to do a check:

for (auto i = words.begin(); i != words.end(); ++i)
    if (*(i->begin()) == '"')
        if (*(i->rbegin()) == '"')
            *i = i->substr(1, i->length() - 2);
        else
            *i = i->substr(1, i->length() - 1);
    else if (*(i->rbegin()) == '"')
        *i = i->substr(0, i->length() - 1);

It might not be the prettiest thing ever, but it's O(n) with a small constant.

Seth Carnegie
  • 73,875
  • 22
  • 181
  • 249
  • I will wait for experts to weigh in since I am not sure I understand what your code is doing – molita Sep 11 '11 at 15:04
  • @molita what don't you understand about it? Here's the words: If the first character is a " then: if the last character is a ", chop the first and last characters off, else chop the first character off. Else, if the last character is a ", chop the last character off. – Seth Carnegie Sep 11 '11 at 15:04
  • Actually now I think I do. Initially I saw lots of *s and thought you were doing raw pointer arithmetic which I don't quite understand but you are dereferencing the iterators, now it makes sense. – molita Sep 11 '11 at 15:09
  • Did you test this? I am getting errors ( Visual C++ 9 Express). I replaced auto with std::string::iterator i but getting error C2839: invalid return type 'char *' for overloaded 'operator ->' 1>d:\cpp\2011\108\genericvalue\main.cpp(372) : error C2039: 'begin' : is not a member of 'std::_String_iterator<_Elem etc – molita Sep 11 '11 at 16:31
  • @molita yeah I tested it and it worked except that I accidentally put `rend` instead of `rbegin`. I updated the code, and this updated version works as expected (I updated my answer too). Here's the output: http://ideone.com/2WaEn – Seth Carnegie Sep 11 '11 at 16:50
  • I like your answer. It does exactly what I attempted but using the std library. So I accept it unless someone points out anything wrong – molita Sep 11 '11 at 17:20
  • @molita if you're not sure that you won't have any empty strings, you might want to add `if (i->length() < 0) continue;` to the top because that code will crash if you try to give it an empty string. – Seth Carnegie Sep 11 '11 at 17:21
  • That is easy enough to guard against. Nit pick. I have an issue with your lack of braces though. I easily get lost when braces are missing on if and loop statements – molita Sep 11 '11 at 17:28
  • @molita you can easily add those to your actual source. I prefer to use braces only when necessary. In fact, the `for` in my example doesn't even need braces. I'll fix that now :) – Seth Carnegie Sep 11 '11 at 17:31
  • The last bit is probably inefficient. Change `*i = i->substr(0, i->length() - 1);` to `i->resize(i.size()-1)`. That will very likely save an O(n) copy and just decrement the internal `size` field, chopping off the last `"`. – MSalters Sep 12 '11 at 08:56
  • 2
    If you've got C++11, you can change the ugly `*(i->begin())` and `*(i->rbegin())` to `i->front()` and `i->back()`. Much more readable. – MSalters Sep 12 '11 at 08:59
  • 1
    You have a problem if your string is one char long and it's a `"` – Colin Pitrat May 27 '16 at 18:57
0

The most efficient way for modern C++ is:

  if (str.size() > 1) {
    if (str.front() == '"' && str.back() == '"') {
      if (str.size() == 2) {
        str.erase();
      } else {
        str.erase(str.begin());
        str.erase(str.end() - 1);
      }
    }
  }

Rationale:

  • The erase() function modifies the string instead of reallocating it.
  • Calling front() on empty strings triggers undefined behavior.
  • This code is open to the possibility that the compiler deduces the intention of the two erase calls and optimize the code further (removing the first and last char together is a standard problem).
Andreas Spindler
  • 7,568
  • 4
  • 43
  • 34
-3

This is how I would approach the situation:

  • Start Simple: Begin with the simplest approach that does the job, like Potatoswatter's answer.
  • Don't Store Quoted Strings: If you can help it, don't store quoted strings at all. Check and unquote strings where ever you are creating the std::vector<std::string> in the first place. If you are simply receiving a std::vector<std::string> there isn't too much you can do as removing the first quote will require copying the rest of the string.
  • Profile/Benchmark: You may be surprised how fast a few 100000 strings can be iterated through and how little any amount of micro-optimizing will get you in the end. There will always be some cases where you do need every little bit of speed but make sure understand how to achieve the biggest gains (which profiling will tell you).
  • Worst Case: If you absolutely have to prevent copying the entire string when unquoting then store an index/iterator to the first "real" character. This may actually be slower with "short" strings but may work with "long" strings (i.e., megabytes in size). You could also create, or find, a string class that handles moving the string start without copying but this would be my last choice.
uesp
  • 6,194
  • 20
  • 15