1
#include <stdio.h>
int main() {
  int c, nl;
  while ((c = getchar()) != EOF) {
    if ( c == '\n') {
      nl++;
      printf("\n%d", nl);
    }
  }
  return 0;
}

Input :
asdsndjkasndasjldk
asdsndjkasndasjldk
asdsndjkasndasjldk
asdsndjkasndasjldk
asdsndjkasndasjldk

Output : 4

The code counts the number of input lines, however, when I compile and run the code it displays a number less than the actual number of lines.

EOF being End Of File...

  • 1
    So add `1` to the number? – klutt Jul 04 '20 at 23:56
  • 8
    `nl` is never initialized. Turn on compiler warnings (and possibly optimization) and your compiler will tell you about this. – Nate Eldredge Jul 04 '20 at 23:56
  • @NateEldredge While turning on warnings is a very good idea, this actually don't yield a warning with gcc. clang does however. – klutt Jul 04 '20 at 23:59
  • 1
    @klutt: It does for me: https://godbolt.org/z/mgZz4P. As usual with gcc, you need `-Wall -O` to get warnings about uninitialized variables. Hence my remark about optimization. – Nate Eldredge Jul 04 '20 at 23:59
  • @NateEldredge Ah, I see. Had no idea I needed `-O` for that. I only used `-Wall -Wextra` – klutt Jul 05 '20 at 00:01
  • @NateEldredge clang seems to be smarter here. It says that it is uninitialized, which it is. But gcc only says that it *may be* uninitialized. It fails to see that that indeed always is the case. – klutt Jul 05 '20 at 00:03
  • @NateEldredge thanks I forgot to declare nl, I initialized by nl = 1; thanks – BathtubSeizure Jul 05 '20 at 00:07
  • @klutt: Not if the input file is empty (i.e. if `getchar()` returns `EOF` on the initial run). And `getchar()` is a black box to gcc; for all it knows, it could be a function that always returns `EOF`. So "may" is not wrong, if we want to get picky. Though you're right in general that gcc tends to hedge and say "may"; these warnings are usually generated based on heuristics rather than proof, because proving it is the halting problem. – Nate Eldredge Jul 05 '20 at 00:07
  • @NateEldredge Yeah, you're right. Also, it seems like clang does not have a "may be " warning. I did a test with an if statement. – klutt Jul 05 '20 at 00:09
  • See also [Printing number of lines in a file without using fgets](https://stackoverflow.com/a/60032606/3422102) – David C. Rankin Jul 05 '20 at 03:46

3 Answers3

2

I realized that defining the number of lines actually was a bit trickier than I thought. But after some thinking, I would use this algorithm with pseudo code:

no_lines = 0
while (c=read_character()) != EOF
    no_lines++
    if c != '\n'
        consume_rest_of_line()

I thought of some cases and what size I "wanted" them to have. The cases are shown below.

0 lines: (obvious)

<EOF>

1 line: (obvious)

Hello<EOF>

1 line: (little tricker, feels like 1 line, but it also feels like \n should affect things)

Hello\n<EOF>

1 line: (just HAS to be different than an empty file)

\n<EOF>

2 lines: (just HAS to have one more than the above)

\n
\n<EOF>

2 lines: (ok, i think i have it now)

\n
Hello<EOF>

When I looked at this, I realized that the number of lines is almost the number of \n but not quite. The \n only says that it's time to see if there is a next line or not. Any character, including \n may start a line, but a \n always ends current line irregardless if it started it or not.

So I ended up with this code:

int main()
{
    int c;
    size_t no_lines = 0;
    while((c = getchar()) != EOF) {
        no_lines++;
        if(c != '\n')
            while(((c = getchar()) != EOF) && c != '\n');
    }
    printf("%zu\n", no_lines);
}

Another way to express it is: "Count the number of \n, and if the last read character is NOT \n, then add one."

klutt
  • 30,332
  • 17
  • 55
  • 95
  • `int prev = 0;` (may be cleaner) and `if(prev && prev != '\n')` - to ensure the count is `0` for an empty file? Test with `printf "" | ./yourexe` and see if it works. – David C. Rankin Jul 05 '20 at 04:28
  • @DavidC.Rankin I sort of like initializeing to `! ` because it conveys more information about my intentions. I can pick ANY number EXCEPT `\n` and that's the information I want to send to the reader. And I cannot see what good it would do to check if `prev` is zero or not. Seems very unnecessary. – klutt Jul 05 '20 at 14:12
  • Your initialization works -- `!'\n'` is equivalent to `!(0xa)` -- which is `0`, I just scratched my head when I looked at it and the "anything not a `'\n'`" explanation. The need for `if(prev && prev != '\n')` is to ensure `no_lines` remains `0` for an empty-file. As it is currently, you report `1` line for an empty-file. – David C. Rankin Jul 06 '20 at 00:21
  • @DavidC.Rankin Actually, it seems to get stuck in an infinite loop, so I'll delete it and look at it tomorrow – klutt Jul 06 '20 at 01:38
  • No worries, the answer is good -- you just need to tweak it a bit. See [Count Lines in C](https://paste.opensuse.org/70610255) for a quick example of what I'm talking about -- feel free to use all or part. – David C. Rankin Jul 06 '20 at 01:55
2

There is a semantic problem here. In Unix world a line in a text file is terminated by a newline character. A file that does not end in a new-line character is not even considered a text file. Contrary to that, many Windows programs tend to consider the newline character as a line separator.

The program calculates the number of newlines in the input. If the input is a text file, then it tells the number of lines too. If it is produced by a broken Windows editor or if you terminated the input before the last newline then it goes wrong. It would work correctly for Unix text files however.


This is not unique to this program. The POSIX utility wc has a switch -l which is commonly said to count lines, but it too actually calculates newline characters in the input! Consider this example:

% printf "abc\nabc\nabc\n" | wc -l
3
% printf "abc\nabc\nabc" | wc -l  
2
1

There are two issues here. First, you never initialized nl and so its value is indeterminate.

Second, consider a text file with just one line:

hello

That file contains only one line and yet there is no newline character. You need to account for that (perhaps by initializing nl to 1).

Daniel Walker
  • 6,380
  • 5
  • 22
  • 45