8

I'm working on the K&R book. I've read farther ahead than I've done exercises, mostly for lack of time. I'm catching up, and have done almost all the exercises from chapter 1, which is the tutorial.

My issue was exercise 1-18. The exercise is to:

Write a program to remove trailing blanks and tabs from line of input, and to delete entirely blank lines

My code (below) does that, and works. My problem with it is the trim method I implemented. It feels ... wrong ... somehow. Like if I saw similar code in C# in a code review, I'd probably go nuts. (C# being one of my specialties.)

Can anyone offer some advice on cleaning this up -- with the catch that said advice has to only use knowledge from Chapter 1 of K & R. (I know there are a zillion ways to clean this up using the full C library; we're just talking Chapter 1 and basic stdio.h here.) Also, when giving the advice, can you explain why it will help? (I am, after all, trying to learn! And who better to learn from than the experts here?)

#include <stdio.h>

#define MAXLINE 1000

int getline(char line[], int max);
void trim(char line[], char ret[]);

int main()
{
    char line[MAXLINE];
    char out[MAXLINE];
    int length;

    while ((length = getline(line, MAXLINE)) > 0)
    {
        trim(line, out);
        printf("%s", out);
    }

    return 0;
}

int getline(char line[], int max)
{
    int c, i;

    for (i = 0; i < max - 1 && (c = getchar()) != EOF && c != '\n'; ++i)
        line[i] = c;

    if (c == '\n')
    {
        line[i] = c;
        ++i;
    }

    line[i] = '\0'; 
    return i;
}

void trim(char line[], char ret[])
{
    int i = 0;

    while ((ret[i] = line[i]) != '\0')
        ++i;

    if (i == 1)
    {
        // Special case to remove entirely blank line
        ret[0] = '\0';
        return;
    }

    for (  ; i >= 0; --i)
    {
        if (ret[i] == ' ' || ret[i] == '\t')
            ret[i] = '\0';
        else if (ret[i] != '\0' && ret[i] != '\r' && ret[i] != '\n')
            break;
    }

    for (i = 0; i < MAXLINE; ++i)
    {
        if (ret[i] == '\n')
        {
            break;
        }
        else if (ret[i] == '\0')
        {
            ret[i] = '\n';
            ret[i + 1] = '\0';
            break;
        }
    }
}

EDIT: I appreciate all the helpful tips I'm seeing here. I would like to remind folks that I'm still a n00b with C, and specifically haven't gotten up to pointers yet. (Remember the bit about Ch.1 of K&R -- Ch.1 doesn't do pointers.) I "kinda" get some of those solutions, but they're still a touch advanced for where I'm at ...

And most of what I'm looking for is the trim method itself -- specifically the fact that I'm looping through 3 times (which feels so dirty). I feel like if I were just a touch more clever (even without the advanced knowledge of C), this could have been cleaner.

Deduplicator
  • 44,692
  • 7
  • 66
  • 118
John Rudy
  • 37,282
  • 14
  • 64
  • 100
  • Can't find any problem here... – Ilya Oct 02 '08 at 12:12
  • This is obviously nearly 7 years late so not necessarily for you but rather for those who follow - in this case K&R could be interpreted to mean lines that contain only `' '` characters when they say "delete entirely blank lines" since the while loop in the main function will exit if `getline` returns a length of zero (i.e. a line with literally no characters). Reference: the code listing on page 29. – chucksmash Mar 04 '15 at 03:52

9 Answers9

9

If you are sticking with chapter 1, that looks pretty good to me. Here's what I would recommend from a code-review standpoint:

When checking equality in C, always put the constant first

if (1 == myvar)

That way you will never accidentally do something like this:

if (myvar = 1)

You can't get away with that in C#, but it compiles fine in C and can be a real devil to debug.

Eric Z Beard
  • 37,669
  • 27
  • 100
  • 145
  • Whoa, lots of downvotes... haven't had much coffee yet, what's wrong with that? – Eric Z Beard Oct 02 '08 at 12:00
  • no reason to downvote perfectly valid advice, i personally do not bother to do sow but any way perfectly valid voting up. – Ilya Oct 02 '08 at 12:03
  • 1
    I guess some people don't like the (1 == x) syntax. I'm one of them, but I don't feel that it's worth a downvote. – aib Oct 02 '08 at 12:04
  • That syntax is part of the MISRA standard. I like it but I know that the guy I sit next to hates it! – Klelky Oct 02 '08 at 12:06
  • Really? What's wrong with that syntax? As soon as I adopted it I stopped wasting so much time tracking down the missing = sign. – Eric Z Beard Oct 02 '08 at 12:07
  • 4
    It makes code less readable. A better solution is to set the compiler warning level high enough to flag it as a warning. – Ferruccio Oct 02 '08 at 12:09
  • 2
    I agree it is less readable, less natural when you say "if 1 equals myvar". I didn't realize there was a compiler flag, I'd be happier with that. – Eric Z Beard Oct 02 '08 at 12:11
  • 2
    I am going to downvote it again: 1==var it has nothing to do with the "uglines" that the question refers to. I do hate this convention - it make me do the work for compiler, but I am downvoting it as irrelevant and off-topic. –  Oct 02 '08 at 13:26
  • +8/-5 so far. Wow, I didn't think this would be so polarizing. – Eric Z Beard Oct 02 '08 at 13:31
  • +1 relevant, useful, on-topic...unless you don't like writing maintainable C. – user7116 Oct 02 '08 at 17:51
  • If you can remember to write `1 == myvar` then you also should be able to remember to write `myvar == 1` and not just `=`. So I think there is no need to do that! – Sam Feb 10 '13 at 15:09
5

There is no reason to have two buffers, you can trim the input line in place

int trim(char line[])
{
    int len = 0;
    for (len = 0; line[len] != 0; ++len)
        ;

    while (len > 0 &&
           line[len-1] == ' ' && line[len-1] == '\t' && line[len-1] == '\n')
        line[--len] = 0;

    return len;
}

By returning the line length, you can eliminate blank lines by testing for non-zero length lines

if (trim(line) != 0)
    printf("%s\n", line);

EDIT: You can make the while loop even simpler, assuming ASCII encoding.

while (len > 0 && line[len-1] <= ' ')
    line[--len] = 0;
Ferruccio
  • 98,941
  • 38
  • 226
  • 299
  • This is the kind of idea I'm looking for ... But I plugged it in and played with it a bit, and instead of trimming end spaces and blank lines, it's actually inserting extra blank lines. :) – John Rudy Oct 02 '08 at 13:17
  • That's what happens when you type in code first thing in the morning without actually checking it :-) – Ferruccio Oct 02 '08 at 14:18
  • I wrote != instead of == in the original while loop. – Ferruccio Oct 02 '08 at 14:22
  • :) I'm working with this now, on my lunch break ... I'm still having some glitches, but I think I can work those out on my own ... I think this may well be the answer I accept! – John Rudy Oct 02 '08 at 16:42
1

Personally for while constructs:

I prefer the following:

while( (ret[i] = line[i]) )
        i++;

to:

while ((ret[i] = line[i]) != '\0')
        ++i;

They both check against != 0 but the first looks a little cleaner. If the char is anything other thah 0, then the loop body will execute else it will break out of the loop.

Also for 'for' statements, whilst being syntatically valid, I find that the following:

for (  ; i >= 0; --i)

just looks 'odd' to me and indeed is a potential nightmare solution for potential bugs. If I was reviewing this code, it would be like a glowing red warning like. Typically you want to use for loops for iterating a known number of times, otherwise cosider a while loop. (as always there are exceptions to the rule but Ive found that this generally holds true). The above for statement could become:

while (i)
{
        if (ret[i] == ' ' || ret[i] == '\t')
        {
            ret[i--] = '\0';
        }
        else if (ret[i] != '\0' && ret[i] != '\r' && ret[i] != '\n')
        {
            break;
        }
}
TK.
  • 46,577
  • 46
  • 119
  • 147
1

trim() is too big.

What I think you need is a strlen-ish function (go ahead and write it int stringlength(const char *s)).

Then you need a function called int scanback(const char *s, const char *matches, int start) which starts at start, goes down to z as long as the character being scanned at s id contained in matches, return the last index where a match is found.

Then you need a function called int scanfront(const char *s, const char *matches) which starts at 0 and scans forward as long as the character being scanned at s is contained in matches, returning the last index where a match is found.

Then you need a function called int charinstring(char c, const char *s) which returns non-zero if c is contained in s, 0 otherwise.

You should be able to write trim in terms of these.

plinth
  • 48,267
  • 11
  • 78
  • 120
0

First of all:

int main(void)

You know the parameters to main(). They're nothing. (Or argc&argv, but I don't think that's Chapter 1 material.)

Stylewise, you might want to try K&R-style brackets. They're much easier on the vertical space:

void trim(char line[], char ret[])
{
    int i = 0;

    while ((ret[i] = line[i]) != '\0')
        ++i;

    if (i == 1) { // Special case to remove entirely blank line
        ret[0] = '\0';
        return;
    }

    for (; i>=0; --i) { //continue backwards from the end of the line
        if ((ret[i] == ' ') || (ret[i] == '\t')) //remove trailing whitespace
            ret[i] = '\0';

        else if ((ret[i] != '\0') && (ret[i] != '\r') && (ret[i] != '\n')) //...until we hit a word character
            break;
    }

    for (i=0; i<MAXLINE-1; ++i) { //-1 because we might need to add a character to the line
        if (ret[i] == '\n') //break on newline
            break;

        if (ret[i] == '\0') { //line doesn't have a \n -- add it
            ret[i] = '\n';
            ret[i+1] = '\0';
            break;
        }
    }
}

(Also added comments and fixed one bug.)

A big issue is the usage of the MAXLINE constant -- main() exclusively uses it for the line and out variables; trim(), which is only working on them doesn't need to use the constant. You should pass the size(s) as a parameter just like you did in getline().

aib
  • 45,516
  • 10
  • 73
  • 79
0

Here's my stab at the exercise without knowing what is in Chapter 1 or K & R. I assume pointers?

#include "stdio.h"

size_t StrLen(const char* s)
{
    // this will crash if you pass NULL
    size_t l = 0;
    const char* p = s;
    while(*p)
    {
        l++;
        ++p;
    }
    return l;
}

const char* Trim(char* s)
{
    size_t l = StrLen(s);
    if(l < 1)
        return 0;

    char* end = s + l -1;
    while(s < end && (*end == ' ' || *end == '\t'))
    {
        *end = 0;
        --end;
    }

    return s;
}

int Getline(char* out, size_t max)
{
    size_t l = 0;
    char c;
    while(c = getchar())
    {
        ++l;

        if(c == EOF) return 0;
        if(c == '\n') break;

        if(l < max-1)
        {
            out[l-1] = c;
            out[l] = 0;
        }
    }

    return l;
}

#define MAXLINE 1024

int main (int argc, char * const argv[]) 
{
    char line[MAXLINE];
    while (Getline(line, MAXLINE) > 0)
    {
        const char* trimmed = Trim(line);
        if(trimmed)
            printf("|%s|\n", trimmed);

        line[0] = 0;
    }

    return 0;
}
orj
  • 13,234
  • 14
  • 63
  • 73
  • uh, that looks dangerous. What happens if some one calls Trim(" "); You are going to read memory that is outside of the string. And with some bad luck you are going to write on that memory, too. – quinmars Oct 02 '08 at 14:57
  • There may be bugs in that code. I didn't test it super thoroughly. You are right. The while loop condition in Trim() should also be testing that end is greater than s. Assuming strings grow up in memory addresses. – orj Oct 17 '08 at 11:13
0

Personally I'd put code like this:

ret[i] != '\0' && ret[i] != '\r' && ret[i] != '\n'

into a separate function (or even a define macro)

ilitirit
  • 16,016
  • 18
  • 72
  • 111
0
  1. trim should indeed use 1 buffer only (as @Ferruccio says).
  2. trim needs to be broken up, as @plinth says
  3. trim needs not return any value (if you want to check for empty string, test line[0] == 0)
  4. for extra C flavor, use pointers rather than indexes

-go to end of line (terminating 0; -while not at the start of line and current character is space, replace it with 0. -back off one char

char *findEndOfString(char *string) {
  while (*string) ++string;
  return string; // string is now pointing to the terminating 0
}

void trim(char *line) {
  char *end = findEndOfString(line);
   // note that we start at the first real character, not at terminating 0
  for (end = end-1; end >= line; end--) {
      if (isWhitespace(*end)) *end = 0;
      else return;
  }
}
0

Another example of doing the same thing. Did some minor violation by using C99-specific stuff. that will not be found in K&R. also used the assert() function which is part of the starndard library, but is probably not covered in chapter one of K&R.

#include <stdbool.h> /* needed when using bool, false and true. C99 specific. */
#include <assert.h> /* needed for calling assert() */

typedef enum {
  TAB = '\t',
  BLANK = ' '
} WhiteSpace_e;

typedef enum {
  ENDOFLINE = '\n',
  ENDOFSTRING = '\0'
} EndofLine_e;

bool isWhiteSpace(
  char character
) {
  if ( (BLANK == character) || (TAB == character ) ) {
    return true;
  } else {
    return false;
  }
}

bool isEndOfLine( 
  char character
) {
 if ( (ENDOFLINE == character) || (ENDOFSTRING == character ) ) {
    return true;
  } else {
    return false;
  }
}   

/* remove blanks and tabs (i.e. whitespace) from line-string */
void removeWhiteSpace(
  char string[]
) {
  int i;
  int indexOutput;

  /* copy all non-whitespace character in sequential order from the first to the last.
    whitespace characters are not copied */
  i = 0;
  indexOutput = 0;
  while ( false == isEndOfLine( string[i] ) ) {
    if ( false == isWhiteSpace( string[i] ) ) {
      assert ( indexOutput <= i );
      string[ indexOutput ] = string[ i ];
      indexOutput++;
    }
    i++; /* proceed to next character in the input string */
  }

  assert( isEndOfLine( string[ i ] ) );
  string[ indexOutput ] = ENDOFSTRING;

}