9

I am trying to split a string using spaces as a delimiter. I would like to store each token in an array or vector.

I have tried.

    string tempInput;
    cin >> tempInput;
    string input[5];

    stringstream ss(tempInput); // Insert the string into a stream
    int i=0;
    while (ss >> tempInput){
        input[i] = tempInput;
        i++;
    }

The problem is that if i input "this is a test", the array only seems to store input[0] = "this". It does not contain values for input[2] through input[4].

I have also tried using a vector but with the same result.

Mike
  • 2,862
  • 10
  • 42
  • 55
  • not really a dupe. It's "Where did I make an error" vs "What is the best way to..."? – SF. Apr 28 '10 at 08:20
  • While the question is exactly the same: how to split a string, I believe that the referred question by @pmr deals with the generic issue, while in this question the problem is not in the actual splitting – David Rodríguez - dribeas Apr 28 '10 at 08:21
  • @David @SF Yes, you are right. Unfortunately most of the answers don't treat the question that way. – pmr Apr 28 '10 at 08:29
  • likely duplicate http://stackoverflow.com/questions/236129/c-how-to-split-a-string – Jasmeet Apr 28 '10 at 08:17

4 Answers4

7

Go to the duplicate questions to learn how to split a string into words, but your method is actually correct. The actual problem lies in how you are reading the input before trying to split it:

string tempInput;
cin >> tempInput; // !!!

When you use the cin >> tempInput, you are only getting the first word from the input, not the whole text. There are two possible ways of working your way out of that, the simplest of which is forgetting about the stringstream and directly iterating on input:

std::string tempInput;
std::vector< std::string > tokens;
while ( std::cin >> tempInput ) {
   tokens.push_back( tempInput );
}
// alternatively, including algorithm and iterator headers:
std::vector< std::string > tokens;
std::copy( std::istream_iterator<std::string>( std::cin ),
           std::istream_iterator<std::string>(),
           std::back_inserter(tokens) );

This approach will give you all the tokens in the input in a single vector. If you need to work with each line separatedly then you should use getline from the <string> header instead of the cin >> tempInput:

std::string tempInput;
while ( getline( std::cin, tempInput ) ) { // read line
   // tokenize the line, possibly with your own code or 
   // any answer in the 'duplicate' question
}
David Rodríguez - dribeas
  • 204,818
  • 23
  • 294
  • 489
3

Notice that it’s much easier just to use copy:

vector<string> tokens;
copy(istream_iterator<string>(cin),
     istream_iterator<string>(),
     back_inserter(tokens));

As for why your code doesn’t work: you’re reusing tempInput. Don’t do that. Furthermore, you’re first reading a single word from cin, not the whole string. That’s why only a single word is put into the stringstream.

Konrad Rudolph
  • 530,221
  • 131
  • 937
  • 1,214
2

The easiest way: Boost.Tokenizer

std::vector<std::string> tokens;

std::string s = "This is,  a test";
boost::tokenizer<> tok(s);
for(boost::tokenizer<>::iterator it=tok.begin(); it != tok.end(); ++it)
{
  tokens.push_back(*it);
}

// tokens is ["This", "is", "a", "test"]

You can parameter the delimiters and escape sequences to only take spaces if you wish, by default it tokenize on both spaces and punctuation.

Matthieu M.
  • 287,565
  • 48
  • 449
  • 722
  • 6
    I wish people would stop instantly raise Boost as a solution. Many places including where I am currently (and worked previously) have to spend months reviewing licenses and audit for _any_ open source project before it can be used and is generally not worth the pain and effort (not to mention the wait) before you get a green (or red) light. Also if this is a homework question then the tutor would not be impressed if the student handed in Boost riddled code. – graham.reeds Apr 28 '10 at 08:25
  • 7
    @graham.reeds: sorry to hear that but – though luck. Boost is an – and very often **the** most appropriate – solution. Are you allowed to use the standard library? After all, it’s an open standard and its implementations are usually open source. In any case, blame your company, not Boost or helpful answers. :-( – Konrad Rudolph Apr 28 '10 at 08:29
  • 4
    @graham.reeds: and the alternative is hidding perfectly valid answers that can be used in other environments? What if someone asks how to parse an XML in c++? Would you want to provide the implementation of an XML parser? or would rather get referred to a library that does it? In simple cases as this, the question will probably get both pure c++ and library based solutions and that, I believe, adds value rather than take it away. (Note: I have not upvoted since I believe the real problem faced by @Mike is not tokenizing the string but rather how he reads the input) – David Rodríguez - dribeas Apr 28 '10 at 08:35
  • @David: good catch, I blindly followed the "split" problem and did not notice the problem he had actually reading it. – Matthieu M. Apr 28 '10 at 17:36
1

Here a little algorithm where it splits the string into a list just like python does.

std::list<std::string> split(std::string text, std::string split_word) {
    std::list<std::string> list;
    std::string word = "";
    int is_word_over = 0;

    for (int i = 0; i <= text.length(); i++) { 
        if (i <= text.length() - split_word.length()) {
            if (text.substr(i, split_word.length()) == split_word) {
                list.insert(list.end(), word);
                word = "";
                is_word_over = 1;
            }
            //now we want that it jumps the rest of the split character
            else if (is_word_over >= 1) {
                if (is_word_over != split_word.length()) {
                    is_word_over += 1;
                    continue;
                }
                else {
                    word += text[i];
                    is_word_over = 0;
                }
            }
            else {
                word += text[i];
            }
        }
        else {
            word += text[i];
        }
    }
    list.insert(list.end(), word);
    return list;
}

There probably exists a more optimal way to write this.

Fabitastic
  • 11
  • 1