6

I am a computer science student, an so do not have much experience with the C++ language (considering it is my first semester using this language,) or coding for that matter.

I was given an assignment to read integers from a text file in the simple form of:

19 3 -2 9 14 4
5 -9 -10 3
.
.
.

This sent me of on a journey to understand I/O operators better, since I am required to do certain things with this stream (duh.)

I was looking everywhere and could not find a simple explanation as to how does the extract>> operator works internally. Let me clarify my question:

I know that the extractor>> operator would extract one continues element until it hits space, tab, or newline. What I try to figure out is, where would the pointer(?) or read-location(?) be AFTER it extracts an element. Will it be on the last char of the element just removed or was it removed and therefore gone? will it be on the space/tab/'\n' character itself? Perhaps the beginning of the next element to extract?

I hope I was clear enough. I lack all the appropriate jargon to describe my problem clearer.


Here is why I need to know this: (in case anyone is wondering...) One of the requirements is to sum all integers in each line separately. I have created a loop to extract all integers one-by-one until it reaches the end of the file. However, I soon learned that the extract>> operator ignores space/tab/newline. What I want to try is to extract>> an element, and then use inputFile.get() to get the space/tab/newline. Then, if it's a newline, do what I gotta do. This will only work if the stream pointer will be in a good position to extract the space/tab/newline after the last extraction>>.


In my previous question, I tried to solve it using getline() and an sstring.


SOLUTION:

For the sake of answering my specific question, of how operator>> works, I had to accept Ben Voigt's answer as the best one. I have used the other solutions suggested here (using an sstring for each line) and they did work! (you can see it in my previous question's link) However, I implemented another solution using Ben's answer and it also worked:

        .
        .
        .

if(readFile.is_open()) {
        while (readFile >> newInput) {
                char isNewLine = readFile.get();    //get() the next char after extraction

                if(isNewLine == '\n')               //This is just a test!
                        cout << isNewLine;          //If it's a newline, feed a newline.
                else
                        cout << "X" << isNewLine;   //Else, show X & feed a space or tab

                lineSum += newInput;
                allSum += newInput;
                intCounter++;
                minInt = min(minInt, newInput);
                maxInt = max(maxInt, newInput);

                if(isNewLine == '\n') {
                        lineCounter++;
                        statFile << "The sum of line " << lineCounter
                        << " is: " << lineSum << endl;
                            lineSum = 0;
                }
        }
        .
        .
        .

With no regards to my numerical values, the form is correct! Both spaces and '\n's were catched: test

Thank you Ben Voigt :)

Nonetheless, this solution is very format dependent and is very fragile. If any of the lines has anything else before '\n' (like space or tab), the code will miss the newline char. Therefore, the other solution, using getline() and sstrings, is much more reliable.

Community
  • 1
  • 1
Gil Dekel
  • 355
  • 4
  • 15
  • Seems you want to read whole lines, and then parse them yourself. – Deduplicator Oct 03 '14 at 15:34
  • You're asking about irrelevant implementation details, mostly. `operator>>` extracts a value and advances the stream. If you want more complicated parsing, use a real parser. – Bartek Banachewicz Oct 03 '14 at 15:36
  • 1
    Did you try calling `get` and checking what character you are getting? – Marc Glisse Oct 03 '14 at 15:36
  • I must admit, I did not. However, I asked this in order to learn the behavior of operator>> in general and not specifically in my current implementation (which is probably not the best of examples. – Gil Dekel Oct 03 '14 at 15:40
  • @Deduplicator - I actually tried that with an sstring. But for some reason, it will only read one line correctly and the rest of the lines are zero'ed out. See [My previous question](http://stackoverflow.com/questions/26134028/not-getting-all-lines-from-a-text-file-when-using-getlin-in-c) for code examples... – Gil Dekel Oct 03 '14 at 15:45
  • @BartekBanachewicz: This is exactly what I am trying to figure out. Advance the stream WHERE? to the beginning of the next element? and, I don't know what is a real parser :/ – Gil Dekel Oct 03 '14 at 15:47
  • It doesn't have to be a concrete "where". Stream is a concept. You are given an interface, and the implementation is pretty much always platform-defined. A "real" parser in my dictionary usually means something like Boost.Spirit(Qi). – Bartek Banachewicz Oct 03 '14 at 15:49
  • Use `while (std::getline(mystream, mystring)) { std::istringstream ss(mystring); int x; while (ss >> x) {} }` – Neil Kirk Oct 03 '14 at 15:54

4 Answers4

4

After extraction, the stream pointer will be placed on the whitespace that caused extraction to terminate (or other illegal character, in which case the failbit will also be set).

This doesn't really matter though, since you aren't responsible for skipping over that whitespace. The next extraction will ignore whitespaces until it finds valid data.

In summary:

  • leading whitespace is ignored
  • trailing whitespace is left in the stream

There's also the noskipws modifier which can be used to change the default behavior.

Ben Voigt
  • 277,958
  • 43
  • 419
  • 720
2

The operator>> leaves the current position in the file one character beyond the last character extracted (which may be at end of file). Which doesn't necessarily help with your problem; there can be spaces or tabs after the last value in a line. You could skip forward reading each character and checking whether it is a white space other than '\n', but a far more idiomatic way of reading line oriented input is to use std::getline to read the line, then initialize an std::istringstream to extract the integers from the line:

std::string line;
while ( std::getline( source, line ) ) {
    std::istringstream values( line );
    //  ...
}

This also ensures that in case of a format error in the line, the error state of the main input is unaffected, and you can continue with the next line.

James Kanze
  • 150,581
  • 18
  • 184
  • 329
  • Thank you for your answer! I actually tried that, but probably messed it up somehow [in my previous question](http://stackoverflow.com/questions/26134028/not-getting-all-lines-from-a-text-file-when-using-getlin-in-c). You can find my clunky code there. It picks up the first line and works on in correctly, and it even detects all the lines in the file properly, but will zero out all the integer values within them. – Gil Dekel Oct 03 '14 at 15:58
  • 1
    @GilDekel Your previous question didn't have a C++ label, so I didn't see it until I followed the link. I've answered it as well. The key is to construct a _new_ `std::istringstream` with the line each time you get a new line. – James Kanze Oct 03 '14 at 16:13
1

According to cppreference.com the standard operator>> delegates the work to std::num_get::get. This takes an input iterator. One of the properties of an input iterator is that you can dereference it multiple times without advancing it. Thus when a non-numeric character is detected, the iterator will be left pointing to that character.

Mark Ransom
  • 299,747
  • 42
  • 398
  • 622
  • cppreference.com is a good place to point someone for further reading, but citing it as the basis for an answer seems wrong -- it isn't an authority. Your answer should be factually grounded in what the Standard says, quite apart from you find it appropriate to quote from the standard in your answer. – Ben Voigt Oct 03 '14 at 15:58
  • Which is fine, but it doesn't actually tell you where the read pointer will end up in the input stream. (Note that `std::istream_iterator` and `std::istreambuf_iterator` have different behaviors in this respect.) – James Kanze Oct 03 '14 at 15:59
  • @BenVoigt you're free to come up with your own answer. I hardly ever feel the need to go to that extreme, which means I'm not practiced enough to be good at it. – Mark Ransom Oct 03 '14 at 16:13
1

In general, the behavior of an istream is not set in stone. There exist multiple flags to change how any istream behaves, which you can read about here. In general, you should not really care where the internal pointer is; that's why you are using a stream in the first place. Otherwise you'd just dump the whole file into a string or equivalent and manually inspect it.

Anyway, going back to your problem, a possible approach is to use the getline method provided by istream to extract a string. From the string, you can either manually read it, or convert it into a stringstream and extract tokens from there.

Example:

std::ifstream ifs("myFile");
std::string str;

while ( std::getline(ifs, str) ) {
    std::stringstream ss( str );
    double sum = 0.0, value;
    while ( ss >> value ) sum += value;
    // Process sum
}
Svalorzen
  • 5,353
  • 3
  • 30
  • 54
  • Thank you. Like I mentioned in my response to @James Kanze, I have tried something similar [in my previous question.](http://stackoverflow.com/questions/26134028/not-getting-all-lines-from-a-text-file-when-using-getlin-in-c) However, I probably messed it up in a silly way since I never used a sstring. – Gil Dekel Oct 03 '14 at 16:07
  • 1
    @GilDekel I've seen your code there, and it looks like you are messing with the contents of the `stringstream` manually to reset it. Consider building a new one at every loop cycle, as I did here. – Svalorzen Oct 03 '14 at 16:09