7

I'm trying to read in lines from a std::istream but the input may contain '\r' and/or '\n', so std::getline is no use.

Sorry to shout but this seems to need emphasis...

The input may contain either newline type or both.

Is there a standard way to do this? At the moment I'm trying

char c;
while (in >> c && '\n' != c && '\r' != c)
    out .push_back (c);

...but this skips over whitespace. D'oh! std::noskipws -- more fiddling required and now it's misehaving.

Surely there must be a better way?!?

Community
  • 1
  • 1
spraff
  • 32,570
  • 22
  • 121
  • 229
  • Are the delimiters mixed in a single file, or will they just vary between files? – jonsca Jul 14 '11 at 14:49
  • I know you could probably do it in 1 pass, but I would do 2 passes, one to change all the end of lines (CR, LF, CRLF) to the `std::endl` (using `in.get()` to read the chars instead of the extraction operator), and then use `getline` on the second pass. – jonsca Jul 14 '11 at 15:49

2 Answers2

4

The usual way to read a line is with std::getline.

Edit: If your implementation of std::getline is broken, you could write something similar of your own, something like this:

std::istream &getline(std::istream &is, std::string &s) { 
    char ch;

    s.clear();

    while (is.get(ch) && ch != '\n' && ch != '\r')
        s += ch;
    return is;
}

I should add that technically this probably isn't a matter of std::getline being broken, as of the underlying stream implementation being broken -- it's up to the stream to translate from whatever characters signify the end of a line for the platform, into a newline character. Regardless of exactly which parts are broken, however, if your implementation is broken, this may be able to make up for it (then again, if your implementation is broken badly enough, it's hard to be sure this will work either).

Jerry Coffin
  • 476,176
  • 80
  • 629
  • 1,111
  • 1
    Nope. `getline` can't handle ambiguous delimiters. – spraff Jul 14 '11 at 14:22
  • Seems to me that the implementation is aware of the newline character(s) on a particular platform (i.e., standard library functions work fine on Macs, PCs or Linux), otherwise it would be maddening. – jonsca Jul 14 '11 at 14:27
  • Read the question: the input may contain *either* newline type. – spraff Jul 14 '11 at 14:29
  • @spraff: I read the question -- but I also understand how streams (are supposed to) work. In fairness, what it considers the end of a line is implementation defined, but if you're on a platform where either '\r' or '\n' should be treated as the end of a line, then either should be read as a newline (in text/translated mode). – Jerry Coffin Jul 14 '11 at 14:50
  • ISO/IEC 14882:2003 [lib.string.io] defines `getline` as terminiating at a delimiter **single character**, there is no awareness of any such concept as translated mode that I can see. My question could be interpreted as "does any such translated mode exist"? – spraff Jul 14 '11 at 15:02
  • @Jerry Coffin: one thing to keep in mind is that some file formats (such as the ever evil PDF) allow and often have mixed line endings in them. So it's not just a platform issue :-/. – Evan Teran Jul 14 '11 at 15:19
  • @spraff: yes, it exists, but you're looking in the wrong place. As I said, it's a characteristic of the underlying stream. The relevant descriptions of the stream, however, are all given in terms of equivalents to mode strings you'd pass to `fopen`, so you need to look in the C standard (§7.19.2/2 in C99). – Jerry Coffin Jul 14 '11 at 16:44
  • @Evan: Okay, fair enough. I guess PDF fits the old joke about platform independent: it's broken equally on all platforms. – Jerry Coffin Jul 14 '11 at 16:48
  • @Jerry: What about for a `std::stringstream`? – Lightness Races in Orbit Jan 17 '12 at 12:09
  • I even run into .cpp files with mixed lining, because they were edited on different systems and uploaded that way into source control. Visual studio whines about that. Any source that is foreign to platform have risk to have mixed line endings – Swift - Friday Pie Nov 26 '16 at 23:42
4

OK, here's one way to do it. Basically I've made an implementation of std::getline which accepts a predicate instead of a character. This gets you 2/3's of the way there:

template <class Ch, class Tr, class A, class Pred>
std::basic_istream<Ch, Tr> &getline(std::basic_istream<Ch, Tr> &is, std::basic_string<Ch, Tr, A>& str, Pred p) {

    typename std::string::size_type nread = 0;      
    if(typename std::istream::sentry(is, true)) {
        std::streambuf *sbuf = is.rdbuf();
        str.clear();

        while (nread < str.max_size()) {
            int c1 = sbuf->sbumpc();
            if (Tr::eq_int_type(c1, Tr::eof())) {
                is.setstate(std::istream::eofbit);
                break;
            } else {
                ++nread;
                const Ch ch = Tr::to_char_type(c1);
                if (!p(ch)) {
                    str.push_back(ch);
                } else {
                    break;
                }
            }
        }
    }

    if (nread == 0 || nread >= str.max_size()) {
        is.setstate(std::istream::failbit);
    }

    return is;
}

with a functor similar to this:

struct is_newline {
    bool operator()(char ch) const {
        return ch == '\n' || ch == '\r';
    }
};

Now, the only thing left is to determine if you ended on a '\r' or not..., if you did, then if the next character is a '\n', just consume it and ignore it.

EDIT: So to put this all into a functional solution, here's an example:

#include <string>
#include <sstream>
#include <iostream>

namespace util {

    struct is_newline { 
        bool operator()(char ch) {
            ch_ = ch;
            return ch_ == '\n' || ch_ == '\r';
        }

        char ch_;
    };

    template <class Ch, class Tr, class A, class Pred>
        std::basic_istream<Ch, Tr> &getline(std::basic_istream<Ch, Tr> &is, std::basic_string<Ch, Tr, A>& str, Pred &p) {

        typename std::string::size_type nread = 0;

        if(typename std::istream::sentry(is, true)) {
            std::streambuf *const sbuf = is.rdbuf();
                str.clear();

            while (nread < str.max_size()) {
                int c1 = sbuf->sbumpc();
                if (Tr::eq_int_type(c1, Tr::eof())) {
                    is.setstate(std::istream::eofbit);
                    break;
                } else {
                    ++nread;
                    const Ch ch = Tr::to_char_type(c1);
                    if (!p(ch)) {
                        str.push_back(ch);
                    } else {
                        break;
                    }
                }
            }
        }

        if (nread == 0 || nread >= str.max_size()) {
            is.setstate(std::istream::failbit);
        }

        return is;
    }
}

int main() {

    std::stringstream ss("this\ris a\ntest\r\nyay");
    std::string       item;
    util::is_newline  is_newline;

    while(util::getline(ss, item, is_newline)) {
        if(is_newline.ch_ == '\r' && ss.peek() == '\n') {
            ss.ignore(1);
        }

        std::cout << '[' << item << ']' << std::endl;
    }
}

I've made a couple minor changes to my original example. The Pred p parameter is now a reference so that the predicate can store some data (specifically the last char tested). And likewise I made the predicate operator() non-const so it can store that character.

The in main, I have a string in a std::stringstream which has all 3 versions of line breaks. I use my util::getline, and if the predicate object says that the last char was a '\r', then I peek() ahead and ignore 1 character if it happens to be '\n'.

Evan Teran
  • 87,561
  • 32
  • 179
  • 238
  • Thanks, I appreciate the effort. I'm staggered that there isn't a famous one-liner for this! – spraff Jul 14 '11 at 15:31