1

My homework is remove duplicates in a random string. My idea is use 2 loops to solve the problem.

1st one will scan every character in the string. 2nd one will check that character is duplicated or not. If so, remove the character.

string content = "Blah blah..."

    for (int i = 0; i < content.size(); ++i) {
            char check = content.at(i);
            for (int j = i + 1; j < content.size() - 1; ++j) {
                if (check == content.at(j)) {
                    content.erase(content.begin()+j);

                }
            }
        }

The problem is it doesn't work. It always removes the wrong character. Seems an indices problem but I don't understand why.

A temporary fix is change content.erase(content.begin()+j); to content.erase( remove(content.begin() + i+1, content.end(), check),content.end());

But I think trigger a "remove by value" scan isn't a nice way. I want to do it with 2 loops or fewer.

Any ideas will be appreciated :)

melpomene
  • 84,125
  • 8
  • 85
  • 148
Tiana987642
  • 696
  • 2
  • 10
  • 28

6 Answers6

5

If use of STL is a possible option, you could use an std::unordered_set to keep the characters seen so far and the erase-remove idiom with std::remove_if, like in the following example:

#include <iostream>
#include <string>
#include <unordered_set>
#include <algorithm>

int main() {
  std::string str("Hello World!");
  std::unordered_set<char> log;
  std::cout << "Before: " << str << std::endl;
  str.erase(std::remove_if(str.begin(), str.end(), [&] (char const c) { return !(log.insert(c).second); }), str.end());
  std::cout << "After:  " << str << std::endl;
}

LIVE DEMO

101010
  • 41,839
  • 11
  • 94
  • 168
  • Nice idea :) Of course use STL is an option, but I want to improve my skill a bit so I try to think an algorithm first, use `` later :) Just my opinion – Tiana987642 Jul 01 '14 at 13:42
  • 1
    @Tiana987642 if it's for educational purposes I'm with you 100%. But when you really want to do serious programming, STL is not only highly recommended, but IMHO its use is mandatory. – 101010 Jul 01 '14 at 13:46
  • Thank, I appreciate your advice :) – Tiana987642 Jul 01 '14 at 14:21
5

Your loops could look the following way

#include <iostream>
#include <string>

int main() 
{
    std::string s = "Blah blah...";

    std::cout << '\"' << s << '\"' << std::endl;

    for ( std::string::size_type i = 0; i < s.size(); i++ )
    {
        std::string::size_type j = i + 1;
        while ( j < s.size() )
        {
            if ( s[i] == s[j] )
            {
                s.erase( j, 1 );
            }
            else
            {
                ++j;
            }
        }
    }

    std::cout << '\"' << s << '\"' << std::endl;

    return 0;
}

The output is

"Blah blah..."
"Blah b."

There are many other approaches using standard algorithms. For example

#include <iostream>
#include <string>
#include <algorithm>
#include <iterator>

int main() 
{
    std::string s = "Blah blah...";

    std::cout << '\"' << s << '\"' << std::endl;

    auto last = s.end();

    for ( auto first = s.begin(); first != last; ++first )
    {
        last = std::remove( std::next( first ), last, *first );
    }

    s.erase( last, s.end() );

    std::cout << '\"' << s << '\"' << std::endl;

    return 0;
}

The output is the same as for the previous code example

"Blah blah..."
"Blah b."
Vlad from Moscow
  • 301,070
  • 26
  • 186
  • 335
  • Thank you for your suggestion :) I learnt a few thing from your code :) If you have free time, please tell me my what's wrong in my code and workaround for it, like you mentioned in your comment :) – Tiana987642 Jul 01 '14 at 14:49
  • @Tiana987642 I think that it is enough to compare your loops with my loops in my first code example. It is obvious that this loop for (int j = i + 1; j < content.size() - 1; ++j) is invalid. It is not clear why condition content.size() - 1 is used and secondly when a character was deleted you must not increment i. – Vlad from Moscow Jul 01 '14 at 14:54
  • My idea is the first loop take a character (let's call it A) , the second will compare A with the rest and delete. So the 2nd will start after the position of A and is ended at the last of the string. At the time I wrote the code I didn't think carefully about this. Just don't want to out of bound. Thank you for point out my mistake :) – Tiana987642 Jul 01 '14 at 15:08
3

I recommend a two pass approach. The first pass identifies the positions of the duplicated characters; the second pass removes them.

I recommend using a std::set and a std::vector<unsigned int>. The vector contains letters that are in the string. The vector contains the positions of the duplicated letters.

The first pass detects if a letter is present in the set. If the letter exists, the position is appended to the vector. Otherwise the letter is inserted into the set.

For the second pass, sort the vector in descending order.
Erase the character at the position in the vector, then remove the position from the vector.

By erasing characters from the end of the string towards the front, the positions of the remaining duplicates won't change when the character is erased from the string.

Thomas Matthews
  • 56,849
  • 17
  • 98
  • 154
1

I am not sure that this is what is causing your problem, but another problem that I see with your code is in your second for loop. Your j < content.size() - 1 statement should just be

j < content.size().

The reasoning for this is a little tricky to see at first, but in this case you are not just getting the size of your vector to act as the size, but to act as the ending indices of your string. You are shortening the last indices by one which means you wont hit the last char in your string. I don't know if this will help your initial problem, but who knows?

DanielAden
  • 133
  • 1
  • 4
  • 11
1

Note: Your actual problem is maintaining a proper index to the next element in question:

  • If you do not erase a character, the next element is at the next position.
  • If you erase a character, the next element will move into the place of the current position (the position stays the same).

Also: There are more efficient solutions (eg.: utilizing a set)

1

sort first

then unique move all unique chars to the beginning and return past-the-end iterator.

then erase the useless chars

string digits("1a2b3c3c2b1a");
sort(digits.begin(), digits.end());
digits.erase( unique( digits.begin(), digits.end() ), digits.end() );
cout << digits << endl;

output

123abc
xitong
  • 352
  • 2
  • 9