1

I am writing a text file parser in C.

I would like to read each line of a text file using fgets, except for the very last line, which I would like to skip.

Also, there is no telling how many characters will be in the file or in the last line, but assume my parser only cares about the first LINEMAXLEN characters in each line.

Currently, the only way I can think to do this is by running two loops, something like the following:

char line[ LINEMAXLEN+1u ];
unsigned int nlines;
unsigned int i;

nlines = 0u;
while ( fgets (line, LINEMAXLEN, file) != NULL )
    nlines += 1u;

i = 0u;
while ( fgets (line, LINEMAXLEN, file) != NULL ) {
    if ( i >= nlines - 1u )
        break;
    //...parse the line
    i += 1u;
}

But surely, there's got to be a smarter way to do it in only one loop, no?

cafce25
  • 15,907
  • 4
  • 25
  • 31
Oh Fiveight
  • 299
  • 1
  • 9
  • If the file ends with a character other than a line terminator, then, presumably, the last line consists of all the characters following the last terminator. (Yes?) But if the file ends with a line terminator, then is the last line the one containing that terminator, or it it an empty line notionally *following* the terminator? – John Bollinger Feb 15 '23 at 17:46
  • 5
    Use a pair of alternating buffers, swapping the pointers in each loop. Then you don't overwrite the previous line. Or more brutally, copy the buffer to a 'previous' buffer before you read each line. – Weather Vane Feb 15 '23 at 17:49
  • (e/c with Weather Vane's comment) Have a loop reading lines. At the top of the loop, read a line, and if you get end-of-file, exit out of the loop. At the end of the loop, copy the line you just read to a second variable, `prevline`. Finally, in the middle of the loop (and except on the first trip through the loop) process `prevline`. – Steve Summit Feb 15 '23 at 17:54
  • If this is on Linux, you could seek to the end of the file, get the file position. Then as you read each line you can increment a character counter with the line length. When the counter reaches the file size, you just read the last line. – Barmar Feb 15 '23 at 17:55
  • 1
    Note that your example code doesn't work as you seem to intend when any line is longer than `LINEMAXLEN - 1` characters (including any trailing newline). You cannot *completely* ignore the tails of such lines -- you need to read and discard each tail in order to get to the next line. – John Bollinger Feb 15 '23 at 17:56
  • 3
    By the way, when you call `fgets`, the third argument is normally the exact size of your buffer. I noticed you declared `char line[LINEMAXLEN+1];`, presumably to leave room for the terminating `\0`. But you don't have to worry about that — `fgets` takes it into account. – Steve Summit Feb 15 '23 at 17:56

3 Answers3

4

Instead of using two loops, it would be more efficient to always read two lines in advance and to only process a line once the next line has been sucessfully read. That way, the last line will not be processed.

Here is an example:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <stdbool.h>

#define LINEMAXLEN 30

//forward function declarations
void process_line( const char *line );
bool read_start_of_line_and_discard_rest( char buffer[], int buffer_size, FILE *fp );

int main( void )
{
    FILE *fp;
    char lines[2][LINEMAXLEN];

    //This index specifies which index in the array "lines"
    //represents the newest line. The other index is the
    //index of the previous line.
    int newest_index = 0;

    //attempt to open file
    fp = fopen( "input.txt", "r" );
    if ( fp == NULL )
    {
        fprintf( stderr, "Error opening file!\n" );
        exit( EXIT_FAILURE );
    }

    //read first line
    if ( !read_start_of_line_and_discard_rest( lines[newest_index], LINEMAXLEN, fp ) )
    {
        fprintf( stderr, "Error reading first line!\n" );
        exit( EXIT_FAILURE );
    }

    //process one line per loop iteration
    for (;;)
    {
        //swap the index, so that the newest line is now the
        //previous line
        newest_index = !newest_index;

        //read the new line
        if ( !read_start_of_line_and_discard_rest( lines[newest_index], LINEMAXLEN, fp ) )
        {
            //we have reached end-of-file, so we don't process the 
            //previous line, because that line is the last line
            break;
        }

        //since reading in a new line succeeded, we can be sure that
        //the previous line is not the last line, so we can process
        //the previous line

        //process the previous line
        process_line( lines[!newest_index] );
    }

    //cleanup
    fclose( fp );
}

//This function will process a line after it has been read
//from the input file. For now, it will only print it.
void process_line( const char *line )
{
    printf( "Processing line: %s\n", line );
}

//This function will read exactly one line of input and remove the
//newline character, if it exists. On success, it will return true.
//If this function is unable to read any further lines due to
//end-of-file, it returns false. If it fails for any other reason, it
//will not return, but will print an error message and call "exit"
//instead.
//If the line is too long to fit in the buffer, it will discard
//the rest of the line and report success.
bool read_start_of_line_and_discard_rest( char buffer[], int buffer_size, FILE *fp )
{
    char *p;

    //attempt to read one line from the stream
    if ( fgets( buffer, buffer_size, fp ) == NULL )
    {
        if ( ferror( fp ) )
        {
            fprintf( stderr, "Input error!\n" );
            exit( EXIT_FAILURE );
        }

        return false;
    }

    //determine whether line was too long for input buffer
    p = strchr( buffer, '\n' );
    if ( p == NULL )
    {
        int c;

        //discard remainder of line
        do
        {
            c = getchar();

        } while ( c != EOF && c != '\n' );
    }
    else
    {
        //remove newline character by overwriting it with a null
        //character
        *p = '\0';
    }

    return true;
}

For the input

This is line1.
This is line2 which has an additional length longer than 30 characters.
This is line3.
This is line4.

this program has the following output:

Processing line: This is line1.
Processing line: This is line2 which has an ad
Processing line: This is line3.

As you can see, all lines except the last line are being processed, and only the first LINEMAXLEN-1 (30-1 in my example) characters of each line are being processed/stored. The remaining characters are being discarded.

Only LINEMAXLEN-1 instead of LINEMAXLEN characters from each line are being processed/stored because one character is required to store the terminating null character.

Andreas Wenzel
  • 22,760
  • 4
  • 24
  • 39
  • This not only answered the question, but also addressed another two important concerns pointed out by other comments: (a) I was adding one extra byte to my buffer for no good reason, (b) my code did not handle if lines were longer than `LINEMAXLEN` chars. Thanks a bunch! – Oh Fiveight Feb 15 '23 at 21:29
  • FWIW: in the `if ( p == NULL )` block, if the first character read is `'\n'` or `EOF`, the buffer was not too long for what OP is trying to save. But true or not, this code reasonably handles that case and the _line was too long for input buffer_ case. – chux - Reinstate Monica Feb 15 '23 at 22:14
  • `if ( ferror( fp ) ) { fprintf( stderr, "Input error!\n" ); exit( EXIT_FAILURE ); }` has a slight weakness. `ferror()` returns true when an input error just occurred or if one occurred prior. Consider testing [`!feof()`](https://stackoverflow.com/a/53274876/2410359) instead to distinguish the case where end-of-file just occurred, yet the error indicator was set earlier from the case where an error just occurred. – chux - Reinstate Monica Feb 15 '23 at 22:39
  • @chux: Yes, you are correct that in the case of the function `read_start_of_line_and_discard_rest` being called while the error flag of the stream is set, then it may be better to test for `!feof()` instead of `ferror()`. On the other hand, if the function is called while the error flag is set, then it is likely that this error was overlooked by the calling code. Otherwise, the calling code would probably have called `clearerr` before calling the function. In that case, it may be desirable for the function to catch the error with an appropriate error message, which my code does. [...] – Andreas Wenzel Feb 17 '23 at 10:02
  • @chux: [...] Alternatively, I could call [`clearerr`](https://en.cppreference.com/w/c/io/clearerr) at the start of the function. But I'm not sure if messing with the stream's state flags is a good idea. The calling code should probably be in control of the stream's state flags. – Andreas Wenzel Feb 17 '23 at 10:04
0

This is quite simple to do in a single loop if we use alternating buffers [as others have mentioned].

In the loop below we read a line into the "current" buffer. If not the first line, we process the previous line in the "other" buffer.

By alternating the index into a buffer pool of two buffers, we avoid unnecessary copying.

This introduces a delay in the processing of the buffer. On the last iteration, the last line will be in the current buffer, but it will not be processed.

#define LINEMAXLEN      1000            // line length of buffer
#define NBUF            2               // number of buffers

char lines[NBUF][LINEMAXLEN];           // buffer pool

int previdx = -1;                       // index of bufs for _previous_ line
int curidx = 0;                         // index of bufs for _current_ line
char *buf;                              // pointer to line buffer to process

// read all lines into alternating line buffers
for (;  fgets(lines[curidx],LINEMAXLEN,stdin) != NULL;
    previdx = curidx, curidx = (curidx + 1) % NBUF) {

    // process _previous_ line  ...
    if (previdx >= 0) {
        buf = lines[previdx];
        // process line ...
    }
}
Craig Estey
  • 30,627
  • 4
  • 24
  • 48
0

fgets() will not modify the buffer at all when it reaches EOF, so just read lines until fgets() returns NULL. The last line read will be retained:

#include <stdio.h>

int main( int argc, char **argv )
{
    char line[ 1024 ];

    FILE *f = fopen( argv[ 1 ], "r" );
    if ( NULL == f )
    {
        return( 1 );
    }

    for ( ;; )
    {
        char *p = fgets( line, sizeof( line ), f );
        if ( NULL == p )
        {
            break;
        }
    }

    printf( "last line: %s\n", line );

    return( 0 ); 
}

This relies on the required behavior of fgets():

The fgets function returns s if successful. If end-of-file is encountered and no characters have been read into the array, the contents of the array remain unchanged and a null pointer is returned.

Robust code should check for errors with ferror().

Working that into your text processing is left as an exercise... ;-)

Andrew Henle
  • 32,625
  • 3
  • 24
  • 56