5

I am using fgets() to read lines from popen("ps -ev", "r") and I cannot find out how to know if fgets() reads a line partially or fully, and if partially how to read/throw away the excess.

When reading each line from popen(), I am reading in the first 1024 characters and getting the information I need from that, which works perfectly fine. The issue arises when the lines are greater than 1024 characters and then the next line I read is a continuation of the previous line, which is not in the format I need (that being the value of each column at the beginning of each line). If I can know if I only partially read a line (that being the line has 1024 or more characters, I want to read and throw away every 1024 characters until it reaches the end. Once at the end, I can call fgets() again and this time it will read from the beginning of the next line rather than the continuation of the previous line.

I know that fgets() reads up until it either finds a newline or until it reaches the provided limit, and then continues reading the remaining part of the line. I have tried checking that the last character is '\0' and that the second last character in the line is '\n', but that does not work. I will post that code below in case that helps.

If you run the code, you will see LINE: num S num:num.num ... (where num is a number) which is what each line should begin with. Some lines will instead look something like LINE: AAAAAAQAAABMAAAAQAAAAAAAAAAMAAAAFAAAAEAAAAAAAAAADAAAACwAAABA.... These are the lines that are excess from the previous line, and these are the ones causing the issues since they are not in the correct format.

Any and all help is highly appreciated.

#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#include <math.h>

#define NEWLINE() printf("\n");
#define DIVIDER() printf("============================================================================\n");
#define PL(l) printf("LINE: %s\n", l);

int const MAX_PROCESSES = 20;
int const BUFFER_SIZE = 1024;

int exhaustedLine(char* line) {
    if (line[sizeof line - 1] == '\0' && line[sizeof line - 2] != '\n') {
        printf("n:%c 0:%c\n", line[sizeof line - 2], line[sizeof line - 1]);
        NEWLINE();
        return -1;
    }
    return 0;   
}

int main(int argc, char const *argv[]) {
    FILE* fp = popen("ps -ev", "r");
    char buf[BUFFER_SIZE];
    char* line = (char*)1;

    while (line) {
        DIVIDER();
        line = fgets(buf, BUFFER_SIZE, fp);
        PL(line);
        if (exhaustedLine(line) != 0) {
            printf("END OF LINE\n");
        }
    }

    return 0;
}
melpomene
  • 84,125
  • 8
  • 85
  • 148
dmoini
  • 313
  • 2
  • 15
  • 3
    If you read just about any `fgets` documentation or reference (like e.g. [this one](https://en.cppreference.com/w/c/io/fgets)) it will tell you that if the full line was read, the last character would be a newline. – Some programmer dude Apr 24 '19 at 06:06
  • @Someprogrammerdude Why don't you consider that an answer? – Yunnosch Apr 24 '19 at 06:08
  • 1
    In `exhaustedLine` , `sizeof line` is the size of the char pointer, so it's not what you want. If the line was not terminated by a newline, `line[n - 1]` will hold the null terminator and `line[n - 2]` will hold a non-null character other than a newline. – M Oehm Apr 24 '19 at 06:08
  • @MOehm Same to you. – Yunnosch Apr 24 '19 at 06:09
  • @Yunnosch: Yes, I know. Too late now. – M Oehm Apr 24 '19 at 06:24
  • 1
    @Someprogrammerdude If EOF is reached, the last line will be fully read but will not be terminated by a newline. Some care is needed if that happens and if the last line happens to be 1 byte shorter than the buffer size. – jamesdlin Apr 24 '19 at 06:25
  • @jamesdlin Of course that requires that the last line isn't terminated by a newline. Such text files should be banished! ;) – Some programmer dude Apr 24 '19 at 06:39

2 Answers2

5

You have the right idea: If a complete line was read, the buffer contains a newline. Otherwise the line is either longer than the buffer size or we are at the end of the file and the last line was unterminated.

The main problem with your implementation is char* line ... sizeof line. sizeof yields the size of the type of its operand expression, so sizeof line means sizeof (char *), which is the size of a pointer, not the size of the array line is pointing into.

Also, if a shorter line was read, then line[SIZE - 1] would access uninitialized memory.

Easiest solution:

int is_full_line(const char *line) {
    return strchr(line, '\n') != NULL;
}

Just use strchr to search the string for '\n'.

To throw away the rest of an overlong line, you have several options:

  • You could call fgets again in a loop.
  • You could call fgetc in a loop: int c; while ((c = fgetc(fp)) != EOF && c != '\n') {}
  • You could use fscanf: fscanf(fp, "%*[^\n]"); fscanf(fp, "%*1[\n]");

Regarding

int const BUFFER_SIZE = 1024;

Note that const does not declare constants in C; it declares read-only variables. char buf[BUFFER_SIZE] is considered a variable-length array because the size is not a constant.

To get a true integer constant in C, you need to use enum instead:

enum { BUFFER_SIZE = 1024 };
melpomene
  • 84,125
  • 8
  • 85
  • 148
  • `#define BUFFER_SIZE 1024` will also work, and is a more idiomatic way of defining constants. But the enum approach plays better with symbolic debuggers, since the `BUFFER_SIZE` symbol exists at compile time, whereas the #define approach substitutes `BUFFER_SIZE` with the literal number during the preprocessing phase, so the debugger will not recognize the symbol. – Ray Apr 24 '19 at 06:29
  • @Ray I'd say that using a `#define` is more common, but I wouldn't say it's idiomatic (or at least, I wouldn't say that it's a *good* idiom). – jamesdlin Apr 24 '19 at 06:37
  • 1
    `is_full_line` won't work for the last line if there's no terminating newline. (I suppose one might argue whether that's a "full line".) – jamesdlin Apr 24 '19 at 06:38
  • @jamesdlin As I wrote ("*... or we are at the end of the file and the last line was unterminated*"), that is indeed not a full line. – melpomene Apr 24 '19 at 06:40
  • 1
    Pedantically, the various approaches here of finding if "the buffer contains a newline" are broken. A line of 8 characters `"abc\0xyz\n"` will incorrectly be assessed as not containing a `'\n`'. – chux - Reinstate Monica Apr 24 '19 at 13:57
2

Your problem is this bit:

line[sizeof line - 1]

line in this case is a char*, so sizeof line evaluates to the size of the pointer, not the size of the string. You need to do something like this:

size_t len = strlen(line);
if (len && '\n' == line[len - 1]) ...

You don't need to test that line[len] == '\0'; that is true for all strings. (Not for all character arrays, mind you, but any standard library function that returns a string will return a null-terminated array.)

Ray
  • 1,706
  • 22
  • 30
  • `strlen(line) - 1` is a route into disaster. – alk Apr 24 '19 at 06:48
  • @alk In general, yes; in this case `line` cannot be empty (after a successful call to `fgets`), but it shouldn't be used in a general `exhaustedLine` function. – melpomene Apr 24 '19 at 06:59
  • 2
    @melpomene Agree that a line can not be empty after a successful call to `fgets()`, yet a successful read can begin will a _null character_ and so `strlen(line) - 1` is a [route to disaster](https://stackoverflow.com/questions/55823341/c-fgets-how-to-tell-if-line-is-greater-than-specified-size#comment98313152_55823434) as that evaluates to `SIZE_MAX` – chux - Reinstate Monica Apr 24 '19 at 14:00
  • 1
    @chux , alk : Quite right. fgets returns NULL if it fails to read any characters, but if the input stream actually *contains* a '\0', fgets will place \0\0 in the buffer and return non-NULL, and strlen(line) will be 0. I've edited the answer accordingly. – Ray Apr 24 '19 at 17:48