2

I've started learning how to use strings, but I'm a little bit confused about the whole concept. I'm trying to read word by word from a file that contains strings.

Here is the file:

Row, row, row your boat,
Gently down the stream.
Merrily, merrily, merrily, merrily,
Life is but a dream. 

My approach was to use

char hold[25];

//  Statement
while(fscanf(fpRow, "%s", hold) != EOF)
    printf("%s %d\n", hold, strlen(hold));

So my task is to read each string and exclude all the , and . in the file. To do so the approach would be to use %[^,.] instead of %s correct? But when I tried this approach my string only wants to read the first word of the file and the loop never exits. Can someone explain to me what I'm doing wrong? Plus, if it's not too much to ask for what's the significance between fscanf and fgets? Thanks

    while(fscanf(fpRow, "%24[^,.\n ]", hold) != EOF)
{
    fscanf(fpRow, "%*c", hold);
    printf("%s %d\n", hold, strlen(hold));
}
Nathan
  • 483
  • 4
  • 7
  • 19
  • Do you mean `fgetc()`? This only reads a single character at a time whereas `fscanf()` can read more and also convert to other data types. – Code-Apprentice Feb 17 '13 at 22:55
  • I would always want to use `fgets` to get a whole line, and process it separately in my loop. It makes it certain that you eventually get to FEOF; if you don't finish scanning a line, I don't know that `fscanf` will behave itself for you if you don't get to the end of the line with your format specifier... – Floris Feb 17 '13 at 22:57
  • @Code-Guru- ah, sorry it was supposed to be 'fgets'. Sorry I'll edit it right now. – Nathan Feb 17 '13 at 23:04
  • Wouldn't you want to ensure `fscanf(...) == 1`, to ensure that one value was read, rather than ensuring it's `!= EOF`? If `%[^,.]` "matches a non-empty sequence of bytes from a set of expected bytes (the scanset)", then won't `fscanf("%[^,.]", ...) == 0` when "..." is entered, indicating a failure of a different kind? ;) – autistic Feb 17 '13 at 23:49

1 Answers1

4

Yes, %[^,. ] should work -- but keep in mind that when you do that, it will stop reading when it encounters any of those characters. You then need to read that character from the input buffer, before trying to read another word.

Also note that when you use either %s or %[...], you want to specify the length of the buffer, or you end up with something essentially like gets, where the wrong input from the user can/will cause buffer overflow.

Jerry Coffin
  • 476,176
  • 80
  • 629
  • 1,111
  • 3
    Man, it's been a while since I did much C programming. Since when does `scanf()` use regexes? – Code-Apprentice Feb 17 '13 at 22:53
  • 1
    `%*[^,. ]` will "skip" whatever is matched, so you don't need to read it into a variable... – Floris Feb 17 '13 at 22:53
  • 1
    @Code-Guru - see http://stackoverflow.com/questions/5999164/is-scanfs-regex-support-a-standard – Floris Feb 17 '13 at 22:55
  • @Jerry Coffin- how would I go about reading it from the input buffer? – Nathan Feb 17 '13 at 23:51
  • 1
    @Code-Guru: Since at least 7th Edition UNIX™ circa 1979, roughly when the `` header was introduced — and it is a very limited form of regular expression: just a character class or negated character class. – Jonathan Leffler Feb 17 '13 at 23:53
  • @Nathan: You can read the next character with `getc(fpRow)`. Beware EOF, that's all. – Jonathan Leffler Feb 17 '13 at 23:54
  • @JonathanLeffler Well, I haven't been programming *that* long. I guess I just missed the whole character class thing when I first learned C. Of course, back then I didn't know about regexes, either. – Code-Apprentice Feb 17 '13 at 23:55
  • @Nathan: Since you're using `scanf` anyway, probably the easiest way is just a `%*c`. – Jerry Coffin Feb 17 '13 at 23:59
  • @JerryCoffin: For some reason I was able to eliminate all the , . \n and white spaces, but when I go to print, it's print way more words than the original file. I have to go to work now, but I'll post the code on top. – Nathan Feb 18 '13 at 00:17