0

On numerous sources, you can find a simple C program to count the number of lines in a file. I'm using one of these.

#include <stdio.h>


int main(int argc, char* argv[]) {   
    FILE *file;
    long count_lines = 0;
    char chr;
 
    file = fopen(argv[1], "r");
    while ((chr = fgetc(file)) != EOF)
    {
        count_lines += chr == '\n';
    }
    fclose(file); //close file.
    printf("%ld %s\n", count_lines, argv[1]);
    return 0;
}

However, it fails to count the num. of lines in Top2Billion-probable-v2.txt. It stops on the line

<F0><EE><E7><E0><EB><E8><FF>

and outputs

1367044 Top2Billion-probable-v2.txt

when it should output 1973218846 lines. wc -l somehow avoids the problem (and is amazingly faster).

Should I give up with a correct C implementation of counting the number of lines of a file or how should I space the special characters as wc does?

dizcza
  • 630
  • 1
  • 7
  • 19
  • 4
    `chr` needs to be an int, not a char. See http://c-faq.com/stdio/getcharc.html – Shawn Jul 04 '20 at 11:26
  • 2
    `fgetc` does not return a `char` but an `int`. How would you be able to distinguish a `0xFF` from an `` with a `char`? – Gerhardh Jul 04 '20 at 11:26
  • Also try opening the file with `"rb"` instead of just `"r"`. `wc` is probably faster because it reads in larger blocks of data instead of individual characters, and searches these using optimized functions such as `strchr`. – Erlkoenig Jul 04 '20 at 11:29
  • You're right. I should have used `int` that solves the issue. – dizcza Jul 04 '20 at 11:44
  • reading in `rb` brings no speedup, btw. – dizcza Jul 04 '20 at 11:49
  • And this is strange that typical internet solutions, at least those that I googled, don't mention the caveat that `chr` should be `int` and not a char. – dizcza Jul 04 '20 at 11:52
  • Please post the solution so that I mark it as solved. – dizcza Jul 04 '20 at 11:57

1 Answers1

0

fgetc() returns the character read as an unsigned char cast to an int or EOF. Hence declaring chr as int instead of char should solve the issue.

ganjaam
  • 1,030
  • 3
  • 17
  • 29