0

I'm currently trying to parse a text file and find any hexadecimal numbers within it. If the hexadecimal number is incorrect I have it display as not number.

input:   

-0xA98F 
0XA98H
0x123 
0xabc

expected output:
-0xA98F valid
 0x123  valid
 0xabc  not valid
 0xA98H not valid

My problem is if I get something like 0xA98H it will output as 0xA98 and display as a number. My goal is to is to get my output like my example however i do no see any resolution to my problem.

  [-]?[0][x|X][0-9A-F]+ {cout << yytext << " Number" << endl; }
rici
  • 234,347
  • 28
  • 237
  • 341
sippycup
  • 145
  • 3
  • 12
  • In short instead of displaying 0xA98 I want this value to be ignored and instead displayed as 0xA98H = not a number – sippycup Mar 04 '17 at 07:38
  • Why did you create a new question instead of editing your original one. I would improve my answer to your previous question [lex parser not displaying hex correctly](http://stackoverflow.com/questions/42592185/lex-parser-not-displaying-hex-correctly) when you would improve the question. – Scheff's Cat Mar 04 '17 at 16:09
  • It is easy to modify the patterns that first number is matched and second none. However, this effort is worthless as long as I don't know 1. what else may occur in the input 2. what else has to be matched (or not). Please provide a (not too long) sample input and what output is expected. – Scheff's Cat Mar 04 '17 at 16:12
  • The only thing that will occur in the input is various false/positive hex values. I edited my post for the input example. – sippycup Mar 04 '17 at 17:00

1 Answers1

3

The following sample code accepts hex numbers according to requirements of OP:

%{
#include <iostream>
#include <string>
using namespace std;

static bool error = false;
static string buffer;
%}

HEX "-"?"0"[xX][0-9A-F]+
EOL (\n|\r|\r\n)

%%

{HEX} { buffer += yytext; }
" " { /* ignore spaces */ }
. { buffer += yytext; error = true; }
{EOL}+ {
  cout << buffer << '\t' << (error ? "not valid" : "valid") << endl;
  buffer.clear();
  error = false;
}

%%

int main(int argc, char **argv) { return yylex(); }

int yywrap() { return 1; }

Compiled with flex and g++ and tested on cygwin:

$ flex -otest-hex.cc test-hex.l ; g++ -o test-hex test-hex.cc

$ echo '-0xA98F                                              
> 0XA98H
> 0x123
> 0xabc
>' | ./test-hex
-0xA98F valid
0XA98H  not valid
0x123   valid
0xabc   not valid

$

Spaces and empty lines are ignored.

(\n|\r|\r\n) denotes a pattern to match Unix-like line-endings, MacOS-like line-endings, and DOS/Windows-like line-endings (in this order).

Scheff's Cat
  • 19,528
  • 6
  • 28
  • 56
  • i was also curious would this code ignore other numbers such as decimals and integers. – sippycup Mar 04 '17 at 20:31
  • @sippycup It would/should due to the `"0"[xX]` part in the pattern. The following rules apply in the flex generated scanner: Patterns must match completely to match a text. If multiple patterns match then the longest match wins. If multiple patterns match equal length of text then the first rule wins. (The rest you find in `man flex`, `info flex`, or [Flex Manual](ftp://ftp.gnu.org/old-gnu/Manuals/flex-2.5.4/html_mono/flex.html). – Scheff's Cat Mar 05 '17 at 08:08