Fileparsing in C

Question

I have a file and I want to read the content of this file line by line with fgets(). There are 10 lines in this file. Each line should contain amongst other things either the word "day" (occurs 5 times in file) or the word "night" (occurs 5 times in file) (both in lower case).

Each line can also consist of whitespace(s) before or after the word "day" or "night" and also must hold a number (3, 11) or one of the lower case letters (a,b,c) per line.

For example:

day 3
 night     11
night a
night   b
day 11
   night    c
night 3
 day    a
day     c
day b

My idea is to first check in each line before \n with strcmp() if either "day" or "night" occurs. If so then I want to know if there exists either a (3, 11) or a (a,b,c) for this "day" or "night". Here my thoughts are the following: What if I delete all whitespaces in each line and determine the number or letter followed by "day" or "night". The problem is that I got stuck here and I do not know what is the best way to determine this. All my thoughts are a way to inconvenient to implement.

You can't use `strcmp` unless both strings are NUL terminated. Consider `memcmp` or manually testing the bytes in the memory. — Myst, Dec 16 '18 at 16:56
[Parsing](https://en.wikipedia.org/wiki/Parsing) is a well known problem. You also need to do some [lexical analysis](https://en.wikipedia.org/wiki/Lexical_analysis) and then you can use [recursing descent parsing](https://en.wikipedia.org/wiki/Recursive_descent_parser) techniques. Read some good compiler textbook, such as the [Dragon book](https://en.wikipedia.org/wiki/Compilers:_Principles,_Techniques,_and_Tools). About half of it cover parsing techniques. Maybe you might simply use [regular expression](https://en.wikipedia.org/wiki/Regular_expression) techniques — Basile Starynkevitch, Dec 16 '18 at 16:59
If your thoughts are too inconvenient to implement, you need new thoughts or you need to overcome inconvenience. If you don't try to implement something, stack overflow is much less likely to be able to help. — mah, Dec 16 '18 at 16:59
Read also about [string processing functions](https://en.cppreference.com/w/c/string/byte) available in standard C. Perhaps [strstr](https://en.cppreference.com/w/c/string/byte/strstr) might be enough. But in 2018, [UTF-8 is used everywhere](http://utf8everywhere.org/). You first need to specify *exactly* what are the possible inputs. Could my family name `Starynkevitch` appear in it? Could it appear in Cyrillic letters: `Старынкевич` ? What would you do for these?. — Basile Starynkevitch, Dec 16 '18 at 17:02

score 0 · Answer 1 · answered Dec 16 '18 at 17:03

0

Use the fgets() buffer as input to sscanf() and let the function do the whitespace work for you

char w1[10], w2[10];
fgets(buf, sizeof buf, handle);
if (sscanf(buf, "%9s%9s", w1, w2) != 2) /* error */;
// w1 is "day" or "night"
// w2 is "a", "b", ... or "11", "3", ...

answered Dec 16 '18 at 17:03

pmg

106,608
13
126
198

This particular code won't handle nicely long words, like my family name (of 13 letters). And what about it in Cyrillic, UTF-8 encoded (Старынкевич; that takes 22 bytes) – Basile Starynkevitch Dec 16 '18 at 17:05
@BasileStarynkevitch: you're correct, I assumed strings 9-chars long was good enough. – pmg Dec 16 '18 at 17:07
2

@pmg while I'm sure the code this is actually for isn't critical, that kind of programming practice leads too that kind or programming everywhere. Things like this can easily become the cause of an exploitable buffer overflow in critical code in the future. Even though you know better than to write that kind of code where it counts, people reading your answer may not (and it's scary how many people copy code without understanding it). – mah Dec 16 '18 at 18:27
Thanks for the heads up @mah. I don't see the buffer overflow in my snippet, though .. – pmg Dec 16 '18 at 20:45
@pmg I'm sorry, you're correct that you don't have a buffer overflow here, because your `sscanf()` format string wisely limits input. I'm concerned though that such a nuance is likely to be lost on anyone whose programming experience is at the level that they would need to seek help with this problem. – mah Dec 17 '18 at 22:15

Fileparsing in C

1 Answers1