3

I am splitting a string into a vector of strings

    vector<string> tokens;

    stringstream strstm(str);
    string item;
    while (getline(strstm, item, ' ')) {
        tokens.push_back(item);
    }

    token_idx = 0;

    cout << "size = " << tokens.size() << endl;

    for (unsigned int i = 0; i < tokens.size(); i++)
    {
        cout << tokens[i] << "[" << i << "]" << endl;
    } 

The split is successful, and the size() and its elements is what I like it to be. However the last token seems to act strangely when I try to get its value.

string Lexer::consume() {
    if (hasValue()) {
        token_idx++;
        cout << "consumed " << tokens[token_idx-1] << " tokens = " << token_idx -1 << endl;
        return tokens[token_idx-1];
    }
    cout << "didn't consume, token_idx = " << token_idx << endl;
    return "null";
}

hasVal is like this

bool Lexer::hasValue() {
    if ( token_idx < tokens.size()) {
        return true;
    } else {
        return false;
    }
}

if i have an input string like such 1 + 2 * 3 the expected output from my program should be (+1(*23)), however I am getting a segmentation error.

size = 5
1[0]
+[1]
2[2]
*[3]
3[4]
consumed 1 tokens = 0
consumed + tokens = 1
consumed 2 tokens = 2
consumed * tokens = 3
consumed 3 tokens = 4
Segmentation fault (core dumped)

But if i change the has value check to ( token_idx < tokens.size() -1 ), the program will return (+1 (*2 null))

size = 5
1[0]
+[1]
2[2]
*[3]
3[4]
consumed 1 tokens = 0
consumed + tokens = 1
consumed 2 tokens = 2
consumed * tokens = 3
didn't consume, token_idx = 4
(+1 (*2 null))

So I'm wondering if there's a end of line after the 3 when splitting the way that I did or is there some other factors contributing to this behaviour? I am quite certain I am not going out of bounds for the vector though.

Jason Hu
  • 1,237
  • 2
  • 15
  • 29
  • I used gdb on the core dump file, however the info it gives me is pretty vague and doesn't tell me what line in my code it crashed on. using command `gdb prefixer core.3211` I get `Core was generated by `./prefixer'. Program terminated with signal 11, Segmentation fault. #0 0x0000003b1229c0d3 in std::basic_string, std::allocator >::size() const () from /usr/lib64/libstdc++.so.6 Missing separate debuginfos, use: debuginfo-install glibc-2.12-1.47.el6_2.12.x86_64 libgcc-4.4.6-3.el6.x86_64 libstdc++-4.4.6-3.el6.x86_64` – Jason Hu May 28 '12 at 19:39
  • Are you compiling with g++? Using -g option? – Vaughn Cato May 28 '12 at 19:40
  • 1
    If you use `where` after the crash, you should definitely get some info. – Vincenzo Pii May 28 '12 at 19:40
  • @puller can you elaborate how to include the `where`. I'm not too familiar with working with core dump files :( – Jason Hu May 28 '12 at 19:42
  • @JasonHu just type `where` after gdb terminates the program run. – Vincenzo Pii May 28 '12 at 19:43
  • If you included a complete, runnable example that crashes, that would make it a lot easier for us to help you. I think what you're showing is not quite enough to troubleshoot this. – NPE May 28 '12 at 19:43
  • the example i think is quite large, with multiple header and cpp files unfortunately :( – Jason Hu May 28 '12 at 19:45
  • @JasonHu do you use `token_idx` to access the vector another time after the last `cout`? Its value is out of bounds at that time... – Vincenzo Pii May 28 '12 at 19:48
  • @puller thanks for your help! I was able to track down the function where it happened using `where`, it was an inspect function where I had `if (s == tokens[token_idx])` and at that point token_idx is already out of bounds. I added a `hasValue()` check in front of that and all is well! Thanks again. – Jason Hu May 28 '12 at 19:51
  • @JasonHu: try getting used to the commands in gdb, like bt to get a backtrace – PlasmaHH May 28 '12 at 20:38

1 Answers1

1

I think the real incriminated code generating the error is not showed her but since I can sense the way you are manipulating indice... there is no mistery that you have done an error accessing past the end on your token list, in addition whith an error prone design, that's all.

if (hasValue()) { // has value is useless to me
    token_idx++;  // why incrementing this here ?

    cout << "consumed " << tokens[token_idx-1] << " tokens = " << token_idx -1 << endl;

    return tokens[token_idx-1];
}

change it to this:

if ( token_idx < tokens.size() ) { 
    cout << "consumed " << tokens[token_idx] << " tokens = " << token_idx << endl;

    return tokens [ token_idx++ ];
}

Also read about recursive descent parsing, It's realy simple and you will be a lot more informed on parsing, avoiding common pitfalls.

Gold
  • 136
  • 6