1

I'm writing a simple wrapper-class for scanning a stream of characters character-by-character.

Scanner scanner("Hi\r\nYou!");
const char* current =  scanner.cchar();
while (*current != 0) {
    printf("Char: %d, Column: %d, Line: %d\n", *current, scanner.column(), scanner.line());
    current = scanner.read();
}

C:\Users\niklas\Desktop>g++ main.cpp -o main.exe
C:\Users\niklas\Desktop>main.exe
Char: 72, Column: 0, Line: 0
Char: 105, Column: 1, Line: 0
Char: 13, Column: 0, Line: 1
Char: 10, Column: 0, Line: 2
Char: 89, Column: 1, Line: 2
Char: 111, Column: 2, Line: 2
Char: 117, Column: 3, Line: 2
Char: 33, Column: 4, Line: 2

This example already shows the problem I'm stuck with. One can interpret \r as a new-line, as well as \n. But together (\r\ n) they are just a single new-line as well!

The function that processes line- and column-numbers is this:

void _processChar(int revue) {
    char chr = _source[_position];
    if (chr == '\r' or chr == '\n') {
        _line += revue;
        _column = 0;
    }
    else {
        _column += revue;
    }
}

Sure, I could just look at the character that appears after the character at the current position, but: I do not check for NULL-termination on the source because I want to be able to process character streams that may contain \0 characters without being terminated at that point.

How can I handle CRLF this way?

Edit 1: DOH! This seems to be working fine. Is this safe in any case or do I have an issue somewhere?

void _processChar(int revue) {
    char chr = _source[_position];

    bool is_newline = (chr == '\r' or chr == '\n');
    if (chr == '\n' and _position > 0) {
        is_newline = (_source[_position - 1] != '\r');
    }

    if (is_newline) {
        _line += revue;
        _column = 0;
    }
    else {
        _column += revue;
    }
}

Thanks!

Niklas R
  • 16,299
  • 28
  • 108
  • 203

4 Answers4

3

Most modern systems handle \n as the the newline for the current target platform so all of that should happen automatically for you if you just check for \n.

Eric Y
  • 1,677
  • 1
  • 12
  • 17
  • That depends on what you want and what and where the text is coming from. The i/o systems in C and C++ will translate the host's newline sequence to `'\n'`. If you got the text some other way, or if you're trying to count lines in a file that's not native to the host system, you cannot rely on that translation. – Adrian McCarthy Jun 11 '12 at 17:49
1

You may need to keep state inside your stream wrapper -- a stateless wrapper, as you've noticed, simply cannot do this, because every output can (by definition) depend on the previous output.

user541686
  • 205,094
  • 128
  • 528
  • 886
0

Your _processChar doesn’t appear to increment the stream read position. Once you change that, you can implement the full newline check:

void _processChar(int revue) {
    char chr = _source[_position];
    if (chr != '\r' and chr != '\n') {
        _column += revue;
        return;
    }
    if (if chr == '\r' and _source[_position + 1] == '\n')
        ++_position;
    _line += revue;
    _column = 0;
}
Konrad Rudolph
  • 530,221
  • 131
  • 937
  • 1,214
  • `_processChar` is not intended to increment the position. :) As stated above, I want to be able to process *not* null-terminated strings as well. The memory at `_source[_position + 1]` may already be *not part* of the parsed source. Stopping the scanning-process at the correct point is left to the user of the class. – Niklas R Jun 11 '12 at 15:31
  • @NiklasR Effectively, you then *cannot* handle it, since `"\r\n"` simply isn’t a single-char token. You need to change your logic to handle it. – Konrad Rudolph Jun 11 '12 at 15:32
0

This seems legit to me:

void _processChar() {
    char chr = _source[_position];

    // Treat CRLF as a single new-line
    bool is_newline = (chr == '\r' or chr == '\n');
    if (chr == '\n' and _position > 0) {
        is_newline = (_source[_position - 1] != '\r');
    }

    if (is_newline) {
        _line += 1;
        _column = 0;
    }
    else {
        _column += 1;
    }
}

At the point where a \n is processed, it checks whether the previous character is carriage return (\r). If so, the line-number is not increased.

Also, before it checks the previous character, it tests whether there is actually a previous character (and _position > 0).

I've removed the int revue argument as I just noticed that what I wanted to achieve is not possible they way I tried to achieve it. I wanted to be able to go backwards in the source, but I can not retrieve the column-number from the previous line then.

Niklas R
  • 16,299
  • 28
  • 108
  • 203