2

Hello I am very new to the C programming language and I am writing my first program in C. I want to remove the "\n" from the end of a line read in with getline() and replace it with "\0". I tried it with the code that is in the if-statement, the first expression in my while loop but I am not getting the behaviour I intended. I tried the solution suggested here Function to remove newline has no effect? but it did not work for my case, and I don't understand why, I assume I am making a mistake with the pointers but I can not figure out what it is. What exactly am I doing wrong and how can I fix it?

void foo(FILE *input, FILE *output) {
    char *line = NULL;
    size_t length = 0;
    ssize_t chars_read;

    while ((chars_read= getline(&line, &length, input)) != -1) {
        if (line[chars_read-1] == '\n') {
            line[chars_read-1] = '\0';
            chars_read = chars_read - 1;
        }
        char *line_rev = malloc(sizeof(char)*chars_read);
        bar(line, line_rev, chars_read);
        if (strcmp(line, line_rev) == 0) {
            ...
        } else {
            ...
        }
        free(line_rev);
    }

    free(line);
}

Update: Thanks for all the helpful answers! For future visitors: Be careful when working on WSL, new lines might be '\n' or '\r' depending on the OS you are working on. For details check Ted's answer ;).

CoffeeKid
  • 123
  • 5
  • Hint: make it *two* functions. Divide&conquer. [ &test ...] – wildplasser Nov 14 '21 at 01:27
  • "I am not getting the behavior I intended", meaning what? Please show an example of what you see and what you'd expect instead. – Bart Nov 14 '21 at 01:28
  • You have a length variable that is unused, and you haven't defined len anywhere (its passed into getline). – Luke Nelson Nov 14 '21 at 01:29
  • 1
    Is this on Linux / MacOS, or on Windows? On Windows, newlines can also use `"\r\n"` so maybe there's an `'\r'` left? – Bart Nov 14 '21 at 01:30
  • @LukeNelson that was a typo it is fixed now i use the length variable in the getline() funciton. – CoffeeKid Nov 14 '21 at 01:31
  • @Bart I am on Windows using WSL (Ubuntu) – CoffeeKid Nov 14 '21 at 01:33
  • 1
    `line[strcspn(line, "\r\n")] = 0;` is one nice approach if you don't know the number of chars read, btw. But you do in this case so there are better ways. What you're doing looks fine. – Shawn Nov 14 '21 at 01:35
  • @Shawn Well that worked thank you, gonna look into what the function does in detail. – CoffeeKid Nov 14 '21 at 01:45
  • The line `char *line_rev = malloc(sizeof(char)*chars_read);` is dubious; you have not allocated space for the null that will need to be added to the end of the string. You need to allocation `chars_read + 1` bytes. Multiplying by `sizeof(char)` is the same as multiplying by `1` and is not particularly helpful. – Jonathan Leffler Nov 14 '21 at 01:46
  • I suspect @Bart is on to something and carriage returns from a Windows style file are messing you up (hence including it in the `strcspn()` characters). Common issue with WSL environments. – Shawn Nov 14 '21 at 02:33
  • @Shawn yes I think that is the case indeed since I tried Ted Lyngmo 's code snipped without the "special care for windows line endings" part and it did not work ;). – CoffeeKid Nov 14 '21 at 02:44

2 Answers2

2

What you need is simply to put a \0 where the \n is.

It could look like this;

char *line = NULL;
size_t length = 0;
ssize_t chars_read;
// ...

    if(chars_read > 0 && line[chars_read-1] == '\n') {
        line[chars_read-1] = '\0';
        // special care for windows line endings:
        if(chars_read > 1 && line[char_read-2] == '\r') line[chars_read-2] = '\0';
    }
Ted Lyngmo
  • 93,841
  • 5
  • 60
  • 108
  • @TedLyngmo I tried your approach too but sadly I had the same result as with my initial if statement, the "\n" did not get replaced. – CoffeeKid Nov 14 '21 at 01:57
  • 1
    @CoffeeKid: How did you establish that the newline was not deleted? I'd use something like `printf("[[%s]]\n", line);` to see — if the `]]` appears on a new line, there is a newline in the data; if not, it was removed. If the `]]` appears at the start of the line with the rest of the data, there is a carriage return `'\r'` in the data. – Jonathan Leffler Nov 14 '21 at 01:59
  • @CoffeeKid I suggest that you check what `line` actually contains. Print out _all_ its contents. – Ted Lyngmo Nov 14 '21 at 01:59
  • @TedLyngmo I did that by writing the contents to an output file and the string still contained a newline. – CoffeeKid Nov 14 '21 at 02:02
  • @CoffeeKid If it did, it magically was not at the end of the string returned by `getline`. Try this after my `if(...) { .. }`: `for(char* ch = line; *ch != '\0'; ++ch); printf("%d\n", *ch);` - What does that show? – Ted Lyngmo Nov 14 '21 at 02:05
  • 97 98 99 99 98 97 is the output for it with the input "abccba" with a newline at the end. – CoffeeKid Nov 14 '21 at 02:13
  • @CoffeeKid ... there is no newline at the end there. `97` is `a`. – Ted Lyngmo Nov 14 '21 at 02:13
  • 1
    By the way I tried your code snipped again and it worked now, might have been a mistake on my part. – CoffeeKid Nov 14 '21 at 02:14
  • 1
    @CoffeeKid: that newline came from somewhere else — the code would print `10` for the newline if it was present. – Jonathan Leffler Nov 14 '21 at 02:14
  • @TedLyngmo by the way by "with a newline at the end" i meant the input, not the output. So yes indeed it got removed ;) thanks a lot! – CoffeeKid Nov 14 '21 at 02:19
  • @TedLyngmo [There are buggy `getline()` implementations out there](https://stackoverflow.com/questions/46313018/getline-returns-1-eof-not-set-errno-not-set-when-given-very-large-input). `chars_read != -1` would be better as `chars_read > 0`. – Andrew Henle Nov 14 '21 at 11:28
  • @AndrewHenle `chars_read > 0` will be `true` if `getline` returns `(size_t)-1` so that's not the proper test alone. It would have to be `chars_read != -1 && chars_read > 0` to work. I think it's better not to taint the answer with things dealing with buggy implementations. I hope the buggy implementations have gotten bugfixed by now. – Ted Lyngmo Nov 14 '21 at 11:32
  • @TedLyngmo [The POSIX documentation says `getline()` returns `ssize_t`](https://pubs.opengroup.org/onlinepubs/9699919799/functions/getdelim.html): "`ssize_t getline(char **restrict lineptr, size_t *restrict n, FILE *restrict stream);`" so the `-1` check *should* be redundant. But as `getline()` isn't standard C, perhaps you have a different implementation in mind? Or there's another bug out there... – Andrew Henle Nov 14 '21 at 11:36
  • @AndrewHenle You are correct! Jeez, I read `size_t` not `ssize_t` :-) Thanks, fixing answer. – Ted Lyngmo Nov 14 '21 at 11:37
  • It's too early in the morning. :-) – Andrew Henle Nov 14 '21 at 11:37
  • @AndrewHenle :-) I wrote this yesterday so I can't blame the morning :-D – Ted Lyngmo Nov 14 '21 at 11:38
0

To replace a potential '\n' in a string with a '\0':

 line[strcspn(line, "\n")] = '\0';

To utilizing the prior length chars_read:

 if (chars_read > 0 && line[chars_read - 1] == '\n') {
   line[--chars_read] = '\0';
 }
   
 

malloc() is 1 too short for OP's need.

    // char *line_rev = malloc(sizeof(char)*chars_read);
    char *line_rev = malloc(sizeof(char)*(chars_read + 1));
    bar(line, line_rev, chars_read);
    if (strcmp(line, line_rev) == 0) {
chux - Reinstate Monica
  • 143,097
  • 13
  • 135
  • 256
  • 1
    Really not needed with `getline()` which reports how many characters it read, including the newline but not the trailing null, into the array it is working with. – Jonathan Leffler Nov 14 '21 at 01:55
  • I am very curious since @Shawn suggested a very similar answer to yours but with = 0 instead of = '\0' and it worked just fine too (just like your suggestion). Could you elaborate whats the difference, and why both approaches work? – CoffeeKid Nov 14 '21 at 02:00
  • @JonathanLeffler Agree'd , Of course the line may lack a final `'\n'` on end-of-file. – chux - Reinstate Monica Nov 14 '21 at 02:00
  • 2
    @CoffeeKid `0`, `'\0'`, `00`, `0.0` and others all are a _zero_. Much is a style difference. – chux - Reinstate Monica Nov 14 '21 at 02:01
  • 1
    One major difference between `0.0` and the other alternatives is the type — the others are all integers, but `0.0` is a `double` value. – Jonathan Leffler Nov 14 '21 at 02:02
  • @Reinstate Monia Oh right because chars are basically ints in C? – CoffeeKid Nov 14 '21 at 02:04
  • @CoffeeKid I prefer to use `'\0'` when assigning to a `char`, a '0' to an integer type and 0.0 to a FP type. – chux - Reinstate Monica Nov 14 '21 at 02:04
  • @CoffeeKid `char` is an _integer_ type, yet `0`, `'\0'`, `00` are all type `int`. – chux - Reinstate Monica Nov 14 '21 at 02:06
  • If you're worried about files without trailing newlines, you can check before you zap the newline. If you're worried about CRLF line endings, you can check for the CR (`'\r'`) after you zap the NL (LF — `'\n'`). Remember to check for empty lines with just a newline — don't go accessing space before the start of `line`. That may make it sensible to use the `strcspn()` variation after all — but it depends on the level of paranoia you wish to acknowledge. – Jonathan Leffler Nov 14 '21 at 02:06
  • Can `getline()` ever return 0? – Shawn Nov 14 '21 at 02:06
  • @Shawn Recall `getline()` is not a standard C library function, so depending on the implementation, who knows? Common linux versions do not return 0 - yet this post not tagged as such. Recall that the first (or later) character read could pathologically be a _null character_, such than `strlen(line) != chars_read`. – chux - Reinstate Monica Nov 14 '21 at 02:09
  • 1
    @Shawn: Interesting question – my understanding is that the answer is "No". The POSIX specification for [`getline()`](https://pubs.opengroup.org/onlinepubs/9699919799/functions/getline.html) says: _Upon successful completion, … getline() … shall return the number of bytes written into the buffer, including the delimiter character if one was encountered before EOF, but excluding the terminating NUL character. If the [EOF] indicator … is set, or if no characters were read and the stream is at [EOF], the [EOF] indicator for the stream shall be set and the function shall return -1._ […continued…] – Jonathan Leffler Nov 14 '21 at 02:14
  • 1
    […continuation…] _If an error occurs, the error indicator for the stream shall be set, and the function shall return -1 and set errno to indicate the error._. That does not leave a way for `0` to be a valid return. If there was no data, then it returns `-1` (not `EOF` — though they are usually the same value); if there was any data, it returns the number of bytes read, which must be at least `1`. – Jonathan Leffler Nov 14 '21 at 02:15
  • I thought it would be obvious I'm talking about the POSIX function. – Shawn Nov 14 '21 at 02:32
  • 1
    @JonathanLeffler That's my understanding too. Though checking to see if the length is 1 or more is a harmless bit of redundancy. – Shawn Nov 14 '21 at 02:37