-1

I want the function to print a 0 whenever it reaches a new line but it's not working but getting each word from file works fine. A speedy response would be appreciated.

The data in the input file looks like this:

blossom flower
bewilder confound confuse perplex
dwell live reside

The code:

int getWord(FILE * in, char str[]){
    int ch;
    int i = 0;
    while(!isalpha(ch = getc(in)) && ch != EOF);
        if(ch == EOF) return -1;
    str[i++] = tolower(ch);
    while(isalpha(ch = fgetc(in)) && ch != EOF){
            if(i < MAX_WORD)
                str[i++] = tolower(ch);
    }
    if(ch == '\n') return 0;
    str[i] = '\0';
    return 1;
}     
Brad Larson
  • 170,088
  • 45
  • 397
  • 571
PinC
  • 9
  • 1
  • 6
  • 3
    Your function won't add the `\0` to the string if it reads a newline. – humodz Oct 02 '14 at 22:05
  • I know i want to do something when a new line is read – PinC Oct 02 '14 at 22:12
  • 5
    You need to define `ch` as an `int`, not as a `char`. `EOF` is a negative `int` value that's unequal to any valid character. – Keith Thompson Oct 02 '14 at 22:13
  • What? i want the function to return an int according to the different cases. So i don't know where you are getting at with changing a char to an int. – PinC Oct 02 '14 at 22:17
  • 1
    because the man page of getc says it returns an int http://linux.die.net/man/3/getc. If all else fails read the manual – pm100 Oct 02 '14 at 22:44
  • ok i understand the int thing but all i want to know is why is not reading '\n' – PinC Oct 02 '14 at 22:48
  • There are no punctuations in the input file its strictly words – PinC Oct 02 '14 at 22:52
  • Your first loop should not check for EOF in the condition, only check for EOF in the body of the loop. – EOF Oct 02 '14 at 22:57
  • 1
    @EOF: The first loop has a semicolon at the end of the line, so the condition is mis-indented and actually is a statement after the loop. The code is poorly laid out, in other words. – Jonathan Leffler Oct 02 '14 at 23:00
  • Can you clarify why so? – PinC Oct 02 '14 at 23:01
  • the indentation of `if(ch == EOF) return -1;` suggests it is part of the while-loop. It is not, so it should not be indented. – EOF Oct 02 '14 at 23:05
  • You should post the code that is calling the `getWord` function. But to get a jump on that, does your input file have Windows-style line endings? Are you opening the file in binary mode? – Samuel Edwin Ward Oct 02 '14 at 23:06
  • yes but what i want to know is when it reaches a newline why is it not return 0 – PinC Oct 02 '14 at 23:07
  • This is my getWord function this is not in main. I'm reading from a normal text file(.txt). – PinC Oct 02 '14 at 23:11
  • besides the "Poorly laid out code" my question still isn't answered i just want to know what is causing it to not `return 0`. – PinC Oct 02 '14 at 23:15
  • Do not destroy your question now that you have an answer to it! I've rolled back your change; do not undo that rollback or otherwise vandalize your question. – Jonathan Leffler Oct 18 '14 at 00:31

1 Answers1

3

Direct answer to question in comment

My question still isn't answered — I just want to know what is causing it to not return 0.

Because:

  1. you are running on Windows,
  2. the file is opened as a binary file, and
  3. the character that terminates words at the end of a line is CR and not LF.

When you next call the function, it reads the LF in the first loop and ignores it because it is not alphabetic.

Main answer

Succinctly, your code does recognize newlines — at least on Linux.

#include <stdio.h>
#include <ctype.h>

enum { MAX_WORD = 50 };

static
int getWord(FILE *in, char str[])
{
    int ch;
    int i = 0;
    while (!isalpha(ch = getc(in)) && ch != EOF)
        ;
    if (ch == EOF)
        return -1;
    str[i++] = tolower(ch);
    while (isalpha(ch = fgetc(in)) && ch != EOF)
    {
        if (i < MAX_WORD)
            str[i++] = tolower(ch);
    }
    if (ch == '\n')
        return 0;
    str[i] = '\0';  // Bug; should be before the if
    return 1;
}

int main(void)
{
    char buffer[MAX_WORD];
    int rc;

    while ((rc = getWord(stdin, buffer)) >= 0)
        printf("Got: %d (%s)\n", rc, buffer);
    return 0;
}

Given the input file:

blossom flower
bewilder confound confuse perplex
dwell live reside

The program produces the output:

Got: 1 (blossom)
Got: 0 (flowerm)
Got: 1 (bewilder)
Got: 1 (confound)
Got: 1 (confuse)
Got: 0 (perplex)
Got: 1 (dwell)
Got: 1 (live)
Got: 0 (residex)

Note that you get stray left over characters in the word when you read a newline (when 0 is returned) and the current word is shorter than the previous word. You could get bad behaviour if the last word on the line is longer than any prior word and the stack is messy enough. You can fix that bug by moving the null termination before the if condition. The output is then:

Got: 1 (blossom)
Got: 0 (flower)
Got: 1 (bewilder)
Got: 1 (confound)
Got: 1 (confuse)
Got: 0 (perplex)
Got: 1 (dwell)
Got: 1 (live)
Got: 0 (reside)

Note that on Windows, if the program gets to read a '\r' (the CR part of the CRLF line endings), then the zero return would be skipped because the character terminating the word was '\r', and in the next call to the function, the first loop skips the '\n'.

Please note that indicating platform (Unix vs Windows) would help clarify the question and get answers more quickly.

Note that when I create a DOS (Windows) format file, data.dos, and read that with the same (bug fixed) binary (running on an Ubuntu 14.04 derivative), the output is:

Got: 1 (blossom)
Got: 1 (flower)
Got: 1 (bewilder)
Got: 1 (confound)
Got: 1 (confuse)
Got: 1 (perplex)
Got: 1 (dwell)
Got: 1 (live)
Got: 1 (reside)

This exactly corresponds to the 'CR terminates the word and the first loop skips the newline' scenario. You could also debug by adding printing statements in strategic places:

#include <stdio.h>
#include <ctype.h>

enum { MAX_WORD = 50 };

static
int getWord(FILE *in, char str[])
{
    int ch;
    int i = 0;
    while (!isalpha(ch = getc(in)) && ch != EOF)
    {
        if (ch == '\n') printf("Got-1 '\\n'\n");
        else if (ch == '\r') printf("Got-1 '\\r'\n");
        else printf("Got-1 '%c'\n", ch);
    }
    if (ch == EOF)
        return -1;
    str[i++] = tolower(ch);
    while (isalpha(ch = fgetc(in)) && ch != EOF)
    {
        if (i < MAX_WORD)
            str[i++] = tolower(ch);
    }
    if (ch == '\n') printf("Got-2 '\\n'\n");
    else if (ch == '\r') printf("Got-2 '\\r'\n");
    else printf("Got-2 '%c'\n", ch);
    str[i] = '\0';
    if (ch == '\n')
        return 0;
    return 1;
}

int main(void)
{
    char buffer[MAX_WORD];
    int rc;

    while ((rc = getWord(stdin, buffer)) >= 0)
        printf("Got: %d (%s)\n", rc, buffer);
    return 0;
}

And on the Unix file, the output is now:

Got-2 ' '
Got: 1 (blossom)
Got-2 '\n'
Got: 0 (flower)
Got-2 ' '
Got: 1 (bewilder)
Got-2 ' '
Got: 1 (confound)
Got-2 ' '
Got: 1 (confuse)
Got-2 '\n'
Got: 0 (perplex)
Got-2 ' '
Got: 1 (dwell)
Got-2 ' '
Got: 1 (live)
Got-2 '\n'
Got: 0 (reside)

And with the Windows file:

Got-2 ' '
Got: 1 (blossom)
Got-2 '\r'
Got: 1 (flower)
Got-1 '\n'
Got-2 ' '
Got: 1 (bewilder)
Got-2 ' '
Got: 1 (confound)
Got-2 ' '
Got: 1 (confuse)
Got-2 '\r'
Got: 1 (perplex)
Got-1 '\n'
Got-2 ' '
Got: 1 (dwell)
Got-2 ' '
Got: 1 (live)
Got-2 '\r'
Got: 1 (reside)
Got-1 '\n'

Note that Unix/Linux does not treat the CRLF combination specially; they are just two adjacent characters in the input stream.

Jonathan Leffler
  • 730,956
  • 141
  • 904
  • 1,278
  • I'm using windows so the newline character is '\r''\n'? So because function calls skip the character it doesn't work? – PinC Oct 02 '14 at 23:23
  • @hinkatana: Yes, no, maybe. On Windows, two consecutive characters, `'\r'` and `'\n'` -- aka CRLF or carriage return, line feed -- mark the end of line when the file is examined as a binary file. Normally, if the file is opened as a text file, the runtime library will map CRLF into a single `'\n'` character. However, if the file was opened as a binary file (`fopen("file.txt", "rb")` for example), then the CR would be made available to the program and would show the behaviour I described. So, it all hinges on how the file is opened. Standard input is usually opened as a text file. – Jonathan Leffler Oct 02 '14 at 23:26
  • hmmm ok so its suppose to be a `'\n'` since i opened it as a regular text file. But i'm still a little confused on why its not printing the `return 0` from your solution and comments. All i can gather is that some instances the `'\n'` is lost or over written between function calls? – PinC Oct 02 '14 at 23:33
  • Nothing worked or helped so i'm just going to try something else but thanks anyways. – PinC Oct 03 '14 at 01:26
  • 1
    Did you try a copy of the diagnostic program above? You can make the code print each character as it is read. You didn't save the file as RTF or anything funny, did you? I regard it as very unlikely, but at this point, everything needs to be checked. Since you've not shown your calling code and how the file is opened, we can't help much more. – Jonathan Leffler Oct 03 '14 at 01:28