0

The fgets() function has two problems. The first is that, if the size of the line is longer than that of the passed buffer, the line is truncated. The second is that, if the line read from the file has embedded '\0' characters, then there is no way to know the actual length of the line. I would like to get a replacement for fgets() that dynamically allocates the space for the line read and also provides the size of the line read. I have written the code for dynamically allocating the space. I am unable to figure out how to get the size of the line read. I am a beginner. Thank you so much.

#include <stdio.h>
#include <stdlib.h>
#include <error.h>
#include <errno.h>

char *myfgets(FILE *fptr, int *size);

char *myfgets(FILE *fptr, int *size) {
    char *buffer;
    char *ret;

    buffer = (char *)malloc((*size) * sizeof(char));
    if (buffer == NULL)
        error(1, 0, "No memory available\n");
    ret = fgets(buffer, *size, fptr);
    if (ret == NULL)
        error(1, 0, "Error in reading the file\n");
    return ret;
}

int main(int argc, char *argv[]) {
    char *file;
    FILE *fptr;
    int size;
    char *result;

    if (argc != 3)
        error(1, 0, "Too many or few arguments <File_name>, <Number of bytes to read>\n");
    file = argv[1];
    size = atoi(argv[2]);
    fptr = fopen(file, "r");
    if (fptr == NULL)
        error(1, 0, "Error in opening the  file\n");
    result = myfgets(fptr, &size);
    printf("The line read is :%s", result);
    free(result);
    return 0;
}
chqrlie
  • 131,814
  • 10
  • 121
  • 189
Manikandan
  • 11
  • 3
  • 1
    If you want to handle non-ASCII characters like `'\0'`, then you should probably be using `read()`, not `fgets()`. – r3mainer May 07 '20 at 12:18
  • 2
    You cannot expect to handle `\0` in a c-type string in any meaningful way. – Jongware May 07 '20 at 12:20
  • 1
    @r3mainer "... non-ASCII characters like `'\0'`" --> `'\0'` is an [ASCII](https://en.wikipedia.org/wiki/ASCII) character (ASCII NUL). `read()` does not stop reading when `'\n'` read. so does not meet requirements either. – chux - Reinstate Monica May 07 '20 at 14:45
  • @Manikandan: you can accept one of the answers by clicking on the grey checkmark below its score. – chqrlie May 07 '20 at 21:27

3 Answers3

2

Use getline(3) to read a complete line of unknown length. It allocates memory as needed to hold it all.

The function can deal with 0 bytes in the line being read too. From the linked man page (emphasis added):

On success, getline() and getdelim() return the number of characters read, including the delimiter character, but not including the terminating null byte ('\0'). This value can be used to handle embedded null bytes in the line read.

So you just have to save its return value instead of using strlen().

Shawn
  • 47,241
  • 3
  • 26
  • 60
2

You have correctly identified 2 issues in fgets(), but your proposed alternative does not address either of them as you still call fgets().

You should write a loop, calling getc() repeatedly until you get EOF or '\n' and you would store the bytes read into an allocated array, reallocating as needed.

Here is a simplistic version:

// Read a full line from `fptr`
// - return `NULL` at end of file or upon read error like `fgets()`.
// - otherwise return a pointer to an allocated array containing the
//   characters read, up to and including the newline and a null terminator.
// - store the number of bytes read into *plength.
// - the buffer is null terminated, and it may contain embedded null bytes
//   if such bytes were read from the file
char *myfgets(FILE *fptr, size_t *plength) {
    size_t length = 0;
    char *buffer = NULL, *newp;
    int c;

    for (;;) {
        if (c = getc()) == EOF) {
            if (!feof(fptr)) {
                /* read error: discard data read so far and return NULL */
                free(buffer);
                buffer = NULL;
                length = 0;
            }
            break;
        }
        if ((newp = realloc(buffer, length + 2)) == NULL) {
            free(buffer);
            error(1, 0, "Out of memory for realloc\n");
            return NULL;
        }
        buffer = newp;
        buffer[length] = c;
        length++;
        if (c == '\n')
            break;
    }
    if (length != 0) {
        buffer[length] = '\0';
    }
    *plength = length;
    return buffer;
}
chqrlie
  • 131,814
  • 10
  • 121
  • 189
  • 1
    Nice. Note that `fgets()` returns `NULL` when `ferror(stdin)` is true, so a little more work needed to handle the rare corner case. – chux - Reinstate Monica May 07 '20 at 14:42
  • @chux-ReinstateMonica: indeed `fgets()` returns `NULL` at end of file or in case of a read error. The above function does the same as `getc()` will return `EOF` in the same cases. `NULL` is returned if no characters have been stored into the destination array, just like `fgets()` does. – chqrlie May 07 '20 at 15:45
  • @chux-ReinstateMonica: Actually, the behavior on read error is different, `NULL` should be returned even if bytes have already been stored int the array. Answer updated. (I should update my own C library too :) – chqrlie May 07 '20 at 15:51
  • 1
    Even nicer. Note: Pedantically there is an advantage of `if (!feof(fptr))` over `if (ferror(fptr))` as delved into [here](https://stackoverflow.com/q/53272650/2410359) as it is a better test when both error and end-of-file flags set. – chux - Reinstate Monica May 07 '20 at 16:23
1

Various approaches for a "fixed" fgets():

1) Use the non-C library standard getline() as suggested by @Shawn. Commonly available in *nix and source code easy enough to find. It unfortunately obliges a new type: ssize_t.

2) Roll your own getc() code @chqrlie. Corner cases can be tricky.

3) Repeatedly call fgets() as needed. Pre-fill the buffer with '\n' and look for the first occurrence of '\n', its position, next character to help determine length. (There are only a few cases to consider)

4) Repeatedly call scanf("%99[^\n]%n", buf100, &n) and getc() for the '\n' as needed. Look at the return value and n to determine length.

5) Likely others

A good functional test of the design is how well did it report the cases:

  • Happy path: a line was read, memory allocated, no problems.

  • End-of-file: Nothing read due to end of file.

  • Out-of-memory.

  • Input error occurred.

Other considerations:

  • Do you really want to save a '\n'?

  • Performance.


As for me with "dynamically allocates the space" with no limit, code introduces the ability for a nefarious user to overwhelm memory resources by entering a pathologically long line. Rather than give such ability to a user, I recommend to limit input to a sane bound. Excessively long input is an attack that should be detected, not enabled.

So I would start with

char *myfgets(FILE *fptr, size_t limit, size_t *size) {
chux - Reinstate Monica
  • 143,097
  • 13
  • 135
  • 256