Determining a hex value from a lex parser

Question

I'm currently trying to parse a text file and find any hexadecimal numbers within it. If the hexadecimal number is incorrect I have it display as not number.

input:   

-0xA98F 
0XA98H
0x123 
0xabc

expected output:
-0xA98F valid
 0x123  valid
 0xabc  not valid
 0xA98H not valid

My problem is if I get something like 0xA98H it will output as 0xA98 and display as a number. My goal is to is to get my output like my example however i do no see any resolution to my problem.

  [-]?[0][x|X][0-9A-F]+ {cout << yytext << " Number" << endl; }

In short instead of displaying 0xA98 I want this value to be ignored and instead displayed as 0xA98H = not a number — sippycup, Mar 04 '17 at 07:38
Why did you create a new question instead of editing your original one. I would improve my answer to your previous question [lex parser not displaying hex correctly](http://stackoverflow.com/questions/42592185/lex-parser-not-displaying-hex-correctly) when you would improve the question. — Scheff's Cat, Mar 04 '17 at 16:09
It is easy to modify the patterns that first number is matched and second none. However, this effort is worthless as long as I don't know 1. what else may occur in the input 2. what else has to be matched (or not). Please provide a (not too long) sample input and what output is expected. — Scheff's Cat, Mar 04 '17 at 16:12
The only thing that will occur in the input is various false/positive hex values. I edited my post for the input example. — sippycup, Mar 04 '17 at 17:00

score 3 · Accepted Answer · answered Mar 04 '17 at 17:46

The following sample code accepts hex numbers according to requirements of OP:

%{
#include <iostream>
#include <string>
using namespace std;

static bool error = false;
static string buffer;
%}

HEX "-"?"0"[xX][0-9A-F]+
EOL (\n|\r|\r\n)

%%

{HEX} { buffer += yytext; }
" " { /* ignore spaces */ }
. { buffer += yytext; error = true; }
{EOL}+ {
  cout << buffer << '\t' << (error ? "not valid" : "valid") << endl;
  buffer.clear();
  error = false;
}

%%

int main(int argc, char **argv) { return yylex(); }

int yywrap() { return 1; }

Compiled with flex and g++ and tested on cygwin:

$ flex -otest-hex.cc test-hex.l ; g++ -o test-hex test-hex.cc

$ echo '-0xA98F                                              
> 0XA98H
> 0x123
> 0xabc
>' | ./test-hex
-0xA98F valid
0XA98H  not valid
0x123   valid
0xabc   not valid

$

Spaces and empty lines are ignored.

(\n|\r|\r\n) denotes a pattern to match Unix-like line-endings, MacOS-like line-endings, and DOS/Windows-like line-endings (in this order).

i was also curious would this code ignore other numbers such as decimals and integers. — sippycup, Mar 04 '17 at 20:31
@sippycup It would/should due to the `"0"[xX]` part in the pattern. The following rules apply in the flex generated scanner: Patterns must match completely to match a text. If multiple patterns match then the longest match wins. If multiple patterns match equal length of text then the first rule wins. (The rest you find in `man flex`, `info flex`, or [Flex Manual](ftp://ftp.gnu.org/old-gnu/Manuals/flex-2.5.4/html_mono/flex.html). — Scheff's Cat, Mar 05 '17 at 08:08

Determining a hex value from a lex parser

1 Answers1