The key to using flex for problems like this is understanding the "maximal-munch" rule. The rule is simple: Flex always picks the action corresponding to the pattern which matches the longest string (starting with the current input point; flex never "searches" for a match.) If more than one pattern matches the same longest substring, then the first pattern in the flex description is chosen. That means that the order of rules is important.
This is described at more length in the Flex manual section on How the Input is Matched.
So let's suppose that you are interested in matching complete words, where "words" are non-empty sequences of arbitrary non-whitespace characters separated by whitespace. (So, for example, the line 3, 4 and 5.
would contain only one valid strings.)
It's easy to identify the four possibilities:
- Decimal integers
- Decimal floating point
- Hexadecimal integers
- Anything other word.
We also need to ignore whitespace, other than recognizing it as a word separator.
If we put the rules in that order, we can be confident that the correct rule will be chosen for each line, because of the maximal munch rule.
So here's the entire flex file (except for the definition of main
):
%option noinput nounput noyywrap nodefault
%%
[[:space:]]+ { /* Ignore whitespace */ }
[+-]?[[:digit:]]+ { printf("%s valid\n", yytext); /* Decimal integer */ }
[+-]?[[:digit:]]+"."[[:digit:]]* {
printf("%s valid\n", yytext); /* Decimal point */ }
[+-]?"."[[:digit:]]+ { printf("%s valid\n", yytext); /* Decimal point */ }
[+-]0[xX][[:xdigit:]]+ { printf("%s valid\n", yytext); /* Hexadecimal integer */ }
[^[:space:]]+ { printf("%s invalid\n", yytext); /* Any word not matched by above rules */ }
Notes
I've used ordinary printf
statements here. You're free to use C++ streams, of course, but I prefer to use either stdio.h
or iostreams
, but not both. It might be considered cleaner to #include <stdio.h>
, but in fact Flex already does that because it needs it for its own purposes.
The %option
statement tells flex that you don't need yywrap
(which means you don't need to provide one or link with -lfl
), that you don't use input
or unput
(which means you can compile with -Wall
without getting unused function warnings) and that you don't expect flex to need to insert a default rule (which saves you from embarrassing errors, because flex will warn you if there is anything which might not match any rule.)
I used [[:xdigit:]]+
in the hexadecimal pattern, which allows both upper and lower-case hex digits. If that's not desired, you could replace it with [0-9A-F]
as in your original code, but your examples seem to indicate that your original code was not correct. Of course, you could write out the posix character classes, but I find them more readable. See the Flex manual section on Patterns for a complete list.