Difficulties with an example 1.9 of The C Programming Language

Question

I'm am working my way through the exercises of the first chapter of The C Programming Language and while I understand most of what is said and shown, there is one example that I don't understand.

In 1.9, there is a function shown to return the length of a line while setting a char array, passed as an argument, to the contents.

int get_line(char s[], int lim)
{
    int c, i, l;

    for (i = 0, l = 0; (c = getchar()) != EOF && c != '\n'; ++i) {
        if (i < lim - 1)
            s[l++] = c;
    }
    if (c == '\n')
        if (l < lim - 1)
            s[l++] = c;
    s[l] = '\0';

    return l;
}

The thing I do not understand, is why we need this: if (c == '\n') {...}. Could this not be combined in the for-loop? Where we explicitly check that c is not equal to '\n'? I'm having trouble wrapping my head around why this needs to be an external condition.

Any light shed would be helpful! Thanks!

It will help us to answer if you suggest an alternative code that does the same without that last condition. — bereal, Oct 30 '21 at 10:26
You'd make it more concise with the `break` keyword. But that's explained in chapter 3, the example is from chapter 1. Writing programming books is not that easy. — Hans Passant, Oct 30 '21 at 10:34

score 2 · Answer 1 · answered Oct 30 '21 at 10:28

2

The for loop is exited if either c equals EOF or c equals '\n'. Therefore, immediately after the for loop, if you want to know which value c has, you must test.

answered Oct 30 '21 at 10:28

Eric Postpischil

195,579
13
168
312

chux - Reinstate Monica · Answer 2 · 2021-10-30T20:32:51.863

is why we need this: if (c == '\n') {...}.

get_line() is structurally:

get_line() {
  initialize

  while get, A not true and B not true
    perform X

  if B
    perform X
  
  finalize

The loop quits under 2 conditions. With one of those (c == '\n'), we still want to perform X somewhere as that is part of the function goal.

Could this not be combined in the for-loop?

It could be combined, yet then we have 2 locations that exit the loop.

Typical coding guidelines promote a single location to quit the loop. If we set aside that goal, then:

get_line() {
  initialize

  while get, A not true
    perform X
    if B quit the loop
  
  finalize

As below with the same number of conditions checks, yet 2 loop exit points.

int get_line(char s[], int lim) {
    int c, i, l;

    for (i = 0, l = 0; (c = getchar()) != EOF; ++i) {
        if (i < lim - 1)
            s[l++] = c;
        if (c == '\n')
            break;
    }

    s[l] = '\0';
    return l;
}

We could contort the code to get the 2 checks back on the same line and not have that pesky after the loop if (c == '\n'). Stylistically this may be harder to follow.

int get_line(char s[], int lim) {
    int c, i, l;

    for (i = 0, l = 0, c = 0; c != '\n' && (c = getchar()) != EOF; ++i) {
        if (i < lim - 1)
            s[l++] = c;
    }
    s[l] = '\0';

    return l;
}

Lastly, code could use improvements:

No need for i and l index counters. One is enough.
Array sizing and index best uses size_t type. Warning: size_t is some unsigned type.
Using a leading size parameter allows for better static code analysis and self-documenting code: the lim relates to s[].
Avoid math on input parameters to not incur overflow. We have more range control on local objects.
Careful when lim is at an extreme or zero.
Rather than assign after declaration, where practical, initialize. E.g. int i = 0;

get_line() {
  initialize

  while B not true, get, A not true
    perform X
  
  finalize

or

#include <stdio.h>
#include <stdlib.h>
      
size_t get_line(size_t size, char s[size]) {
  int ch = 0;
  size_t i = 0;

  while (ch != '\n' && (ch = getchar()) != EOF) {
    if (i + 1 < size)
      s[i++] = (char) ch;
  }

  // size might have been pathologically 0, so no room for \0
  if (i < size) {
    s[i] = '\0';
  }
  return i;
}

I really don't understand how some people think that two exit points in a loop could be less readable. It more clearly expresses what's going on that the original, and the DRY aspect helps readability also. — Elliott, Oct 30 '21 at 11:48
@Elliott I also do not see 2 clear exits as an issue. Yet for learners, multiple exits tend to lead to poor code - so consider a single exit a goal and not a rule. As with many coding practices, they are not "Thou shall not do this" and are more like "Avoid this unless that". — chux - Reinstate Monica, Oct 30 '21 at 11:53
@Elliott On a separate point, the `int ch = 0; ... while (ch != '\n' && (ch = getchar()) != EOF) { ... }` idiom is something I conjured up for this answer. Hope it does not look too weird. I did like how it shortened code. — chux - Reinstate Monica, Oct 30 '21 at 11:56

score 0 · Accepted Answer · answered Oct 30 '21 at 10:29

0

If you want to put it in the loop, you have to do something like that:

int get_line(char s[], int lim)
{
    int c, i, l;

    for (i = 0, l = 0; (c = getchar()) != EOF; ++i) {
        if ((i < lim - 1) && (c != '\n'))
            s[l++] = c;
        else if (c == '\n') {
            if (l < lim - 1)
                s[l++] = c;
             break;
         }
    }

    s[l] = '\0';

    return l;
}

So as you see, wrapping the condition inside the loop, led to more conditions checks and a break statatement.

answered Oct 30 '21 at 10:29

Alaa Mahran

663
4
12

That makes sense! I guess I don't really understand why we need to include the `'\n'` in the first place. But that has more to do with the program than with the actual code. But your example made it very clear that it is desirable to keep it out of the loop. Thanks! – Rein Van Imschoot Oct 30 '21 at 10:33
@ReinVanImschoot Because the code is supposed to read a _line_. So it has to stop when a _new_ line begins (`\n` is the newline character). – CherryDT Oct 30 '21 at 11:40

Difficulties with an example 1.9 of The C Programming Language

3 Answers3