8

The following (toy) program returns different things when linked against libstdc++ and libc++. Is this a bug in libc++ or do I not understand how istream eof() works? I have tried running it using g++ on linux and mac os x and clang on mac os x, with and without -std=c++0x. It was my impression that eof() does not return true until an attempt to read (by get() or something else) actually fails. This is how libstdc++ behaves, but not how libc++ behaves.

#include <iostream>
#include <sstream>

int main() {
    std::stringstream s;

    s << "a";

    std::cout << "EOF? " << (s.eof() ? "T" : "F") << std::endl;
    std::cout << "get: " << s.get() << std::endl;
    std::cout << "EOF? " << (s.eof() ? "T" : "F") << std::endl;

return 0;
}

Thor:~$ g++ test.cpp
Thor:~$ ./a.out
EOF? F
get: 97
EOF? F
Thor:~$ clang++ -std=c++0x -stdlib=libstdc++ test.cpp 
Thor:~$ ./a.out
EOF? F
get: 97
EOF? F
Thor:~$ clang++ -std=c++0x -stdlib=libc++ test.cpp 
Thor:~$ ./a.out
EOF? F
get: 97
EOF? T
Thor:~$ clang++ -stdlib=libc++ test.cpp 
Thor:~$ ./a.out
EOF? F
get: 97
EOF? T
zaphoyd
  • 2,642
  • 1
  • 16
  • 22

4 Answers4

5

EDIT: This was due to the way older versions of libc++ interpreted the C++ standard. The interpretation was discussed in LWG issue 2036, it was ruled to be incorrect and libc++ was changed.

Current libc++ gives the same results on your test as libstdc++.

old answer:

Your understanding is correct.

istream::get() does the following:

  1. Calls good(), and sets failbit if it returns false (this adds a failbit to a stream that had some other bit set), (§27.7.2.1.2[istream::sentry]/2)
  2. Flushes whatever's tie()'d if necessary
  3. If good() is false at this point, returns eof and does nothing else.
  4. Extracts a character as if by calling rdbuf()->sbumpc() or rdbuf()->sgetc() (§27.7.2.1[istream]/2)
  5. If sbumpc() or sgetc() returned eof, sets eofbit. (§27.7.2.1[istream]/3) and failbit (§27.7.2.2.3[istream.unformatted]/4)
  6. If an exception was thrown, sets badbit (§27.7.2.2.3[istream.unformatted]/1) and rethrows if allowed.
  7. Updates gcount and returns the character (or eof if it couldn't be obtained).

(chapters quoted from C++11, but C++03 has all the same rules, under §27.6.*)

Now let's take a look at the implementations:

libc++ (current svn version) defines the relevant part of get() as

sentry __s(*this, true);
if (__s)
{
    __r = this->rdbuf()->sbumpc();
    if (traits_type::eq_int_type(__r, traits_type::eof()))
       this->setstate(ios_base::failbit | ios_base::eofbit);
    else
        __gc_ = 1;
}

libstdc++ (as shipped with gcc 4.6.2) defines the same part as

sentry __cerb(*this, true);
if (__cerb)
  {
    __try
      {
        __c = this->rdbuf()->sbumpc();
        // 27.6.1.1 paragraph 3
        if (!traits_type::eq_int_type(__c, __eof))
          _M_gcount = 1;
        else
          __err |= ios_base::eofbit;
      }
[...]
if (!_M_gcount)
  __err |= ios_base::failbit;

As you can see, both libraries call sbumpc() and set eofbit if and only if sbumpc() returned eof.

Your testcase produces the same output for me using recent versions of both libraries.

Cubbi
  • 46,567
  • 13
  • 103
  • 169
  • This is wierd. I can't find any of the text your quoting in my versions of the standard (C++03 and N3291): both of my versions says that `get` "Behaves as an unformatted input function. After constructing a sentry object, extracts a character c, if one is available." Nothing about the number of calls to `rdbuf()->sbump()` or `rdbuf()->sgetc()`. Although I wouldn't normally expect it, there's nothing illegal about an implementation that makes an additional call to `rdbuf()->sgetc()`, and setting `eofbit` because of that. – James Kanze Jan 26 '12 at 08:58
  • Several points on your list of actions: concerning point 2: `istream::get()` doesn't do this---it is part of the actions of the constructor of the `sentry` object. Concerning point 3 and 4: the standard is much less constraining. Extraction must be _as if_ by calling `rdbuf()->sbumpc()` or `rdbuf()->sgetc()` (which is an error, since `rdbuf()->sgetc()` doesn't extract, and `rdbuf()->snextc()` and `rdbuf->sgetn()`, which do, aren't mentionned). This says nothing about when and if look-ahead occurs. – James Kanze Jan 26 '12 at 09:07
  • @JamesKanze Regarding sentry, the actions of its constructor are part of what `istream::get()` does. Like anything in C++, it's as-if: the implementation may (and sometimes does) do some of what it is supposed to do directly in `get()`. Regarding an additional call to sgetc -- there is nothing illegal in calling it or any other functions, but it would be illegal to set eofbit because of what it returned because it would violate the as-if clause. – Cubbi Jan 26 '12 at 11:34
  • Calling any of the functions in a `streambuf` is observable behavior (since they forward to user defined virtual functions which may, and typically do, make system calls). And all of the implementations do set `eofbit` any time they see an end of file. If `get()` usually doesn't set it in this particular case, it's because one can implement `get()` without any lookahead. Can, not must. But the standard is sigularly silent about lookahead. – James Kanze Jan 26 '12 at 13:26
  • @JamesKanze get() is *specified* to extract one character. Extraction of one character is *specified* as an as-if call to sbumpc/sgetc. A call to sbumc/sgetc is *specified* to result in eofbit if eof is returned. I agree that the standard isn't saying whether it **the** call to sbumc/sgetc or **any call** to sbumc/sgetc, including the unnecessary calls added by the particularly curious input function. – Cubbi Jan 26 '12 at 14:21
  • `get()` is specified to extract one character, in the same way `>> int` is specified to extract the characters making up the `int`. The standard definitely does allow look ahead, however; otherwise, `>> int` could not be implemented. And if the istream encounters EOF on that look ahead, it *will* set `eofbit`; historically, implementations of `filebuf` would not necessarily return EOF if `sgetc` was called a second time. The specified semantics of `eofbit` are "indicates that an input operation reached the end of an input sequence". Which is true if `get()` reads the last byte. – James Kanze Jan 26 '12 at 14:47
  • Given the above, one can argue that reading the last byte must set `eofbit`. I think you can argue that that's what the standard actually says, but that's not the traditional way of interpreting it, and not what current implementations do. Extracting the last byte *may* set `eofbit`, or it may not; it all depends on the implementation (and what you're inputting). – James Kanze Jan 26 '12 at 14:49
  • @JamesKanze I suppose the standard technically allows unnecessary lookaheads under "may use other members", but I fail to see why an implementation would attempt to extract additional characters from the input sequence when not required to do so. That would make `std::cin.get()` impossible. As for `>> int`, the standard is explicit about every call to sbumpc() and sgetc() that it makes, as well as about the conditions where it sets eofbit (even if it effectively repeats the catch-all clause from [istream] in doing so). – Cubbi Jan 26 '12 at 16:04
  • The standard allows lookaheads. Period. It doesn't try to distinguish where they are necessary or not. Logically, of course... I don't see why `get()` would look ahead either, unless it were to explicitly set `eofbit`, and I've never heard of an implementation doing so. Most of the time, probably always, it's pretty obvious whether look ahead is needed or not, but the standard generally leaves the implementation a fair amount of leeway. – James Kanze Jan 26 '12 at 17:10
  • @JamesKanze I finally looked at LWG issues list. [Reading the last character does not set eofbit and the standard says so already](http://www.open-std.org/jtc1/sc22/wg21/docs/lwg-closed.html#2036) -- this was actually the source of the old `libc++` bug. – Cubbi Jan 26 '12 at 17:32
  • ...and of course this was already posted while I was flipping pages of the standard. – Cubbi Jan 26 '12 at 17:59
  • It does look like the committee is moving in the direction of being more explicit; rereading the standard wording (for the nth time), I'm sort of inclined to say that you are right. This does represent a significant break with historical practice (which doubtlessly colored my previous readings of the standard). – James Kanze Jan 27 '12 at 08:28
4

This was a libc++ bug and has been fixed as Cubbi noted. My bad. Details are here:

http://lwg.github.io/issues/lwg-closed.html#2036

Jake Petroules
  • 23,472
  • 35
  • 144
  • 225
Howard Hinnant
  • 206,506
  • 52
  • 449
  • 577
1

The value of s.eof() is unspecified in the second call—it may be true or false, and it might not even be consistent. All you can say is that if s.eof() returns true, all future input will fail (but if it returns false, there's no guarantee that future input will succeed). After failure (s.fail()), if s.eof() returns true, it's likely (but not 100% certain) that the failure was due to end of file. It's worth considering the following scenario, however:

double test;
std::istringstream s1("");
s1 >> test;
std::cout << (s1.fail() ? "T" : "F") << (s1.eof() ? "T" : "F") << endl;
std::istringstream s2("1.e-");
s2 >> test;
std::cout << (s2.fail() ? "T" : "F") << (s2.eof() ? "T" : "F") << endl;

On my machine, both lines are "TT", despite the fact that the first failed because there was no data (end of file), the second because the floating point value was incorrectly formatted.

James Kanze
  • 150,581
  • 18
  • 184
  • 329
  • How is it unspecified? The standard is clear: set failbit and eofbit if sbumpc()/sgetc() returns eof, set badbit if an exception is thrown. – Cubbi Jan 25 '12 at 16:55
  • @Cubbi No. Set `eofbit` if `sgetc` returns eof, but not necessarily `failbit`; look ahead is always legal, and sometimes necessary. And when and how often exactly `get` calls `sgetc` isn't specified. – James Kanze Jan 25 '12 at 17:20
  • I posted the response as an answer. – Cubbi Jan 25 '12 at 18:09
0

eofbit is set when there is an operation which tries to read past the end of file, the operation may not fail (if you are reading an integer and there is no end of line after the integer, I expect eofbit to be set but the read of the integer to succeed). I.E. I get and expect FT for

#include <iostream>
#include <sstream>

int main() {
    std::stringstream s("12");
    int i;
    s >> i;

    std::cout << (s.fail() ? "T" : "F") << (s.eof() ? "T" : "F") << std::endl;

    return 0;
}

Here I don't expect istream::get to try and read after the returned character (i.e. I don't expect it to hang until I enter the next line if I read a \n with it), so libstd++ seems indeed right, at least in a QOI POV.

The standard description for istream::get just says "extracts a character c, if one is available" without describing how and so doesn't seem to prevent libc++ behavior.

AProgrammer
  • 51,233
  • 8
  • 91
  • 143