-1

This is what I know:

  1. ^ inside brackets matches a character that isn't one of the included inside the brackets.
  2. + Matches one or more appearances of the expression to its left (in my ex. [^0-9]).
  3. $ If I'm not mistaken, matches to an expression that ends with the expression to its left.

Then it seems this expression should match input that has at least one character that isn't a digit and that ends with that expression, for example it should match: 1a, aaa, 2321a,1b1b

and should not match: 111, 432423,asd3213

but it is unclear to me from running this rule what exactly it matches.

This is my full code:

%option noyywrap
%{
    #include<stdio.h>
%}

%%
[^0-9]+$ printf("not a number");
%%
int main()
{
    yylex();
    return 0;
}

And I'm using flex.

output examples(sorry for the links, it won't let me upload a photo):

[1] https://ibb.co/qp3hB0r - doesn't match but prints back

[2] https://ibb.co/syZHjrw - doesn't match and eats it (why does it happen if I didn't add ".|\n" in the code?)

[3] https://ibb.co/s6S0tQh - matches and prints back

[4] https://ibb.co/VmZW7KR - same as the 3rd

[5] https://ibb.co/2vPfWhc - matched only the 11(?) and ate up the aa

I'm really confused as to what it actually matches and would appreciate the help.

John Bollinger
  • 160,171
  • 8
  • 81
  • 157
  • 4
    "it won't let me upload a photo" It seems you are dealing with text. Why would you want to include pictures of text if you can include the text itself into the question? Pictures of text are highly discouraged at SO. Just copy&paste as formatted text into your question. – Gerhardh Nov 15 '21 at 17:57
  • 1
    You might want to start by reading a couple of pages from the flex manual: [Pattern Syntax](http://westes.github.io/flex/manual/Patterns.html#Patterns) and [How the Input is Matched](http://westes.github.io/flex/manual/Matching.html#Matching). The second one is particularly important, since it explains what a flex scanner actually does (split the input into consecutive tokens, which is not the same as searching for a regular expression), and what it does when no pattern matches. After you read that, if there is still something you don't understand, feel free to ask a more precise question. – rici Nov 15 '21 at 18:24

1 Answers1

2

This is what I know:

  1. ^ inside brackets matches a character that isn't one of the included inside the brackets.

That's an odd way to put it. More accurate would be that the whole bracket-enclosed fragment matches one character that is not (because of the ^) in the range '0' - '9'.

  1. + Matches one or more appearances of the expression to its left (in my ex. [^0-9]).

Again an odd way to put it. The + quantifier modifies the preceding fragment to match one or more appearances of whatever it otherwise would match exactly once.

  1. $ If I'm not mistaken, matches to an expression that ends with the expression to its left.

You are mistaken. The $ anchors the match to the end of a line -- the overall pattern matches only text that ends at the end of a line, as determined by immediately preceding a newline (and therefore not at the very end of the file). That's a restriction, not an extension: nothing is matched that wouldn't be matched by the pattern excluding the $, but there is an additional requirement that the match occur at the end of a line. That's not at all the same thing as matching text that ends with a match to the preceding pieces of the pattern.

Thus,

it seems this expression should match input that has at least one character that isn't a digit and that ends with that expression, for example it should match: 1a, aaa, 2321a,1b1b

No. Taking those as four separate examples, it would not match any of them unless they appeared at the end of a line. If they all did appear at the end of a line, then only aaa would be matched in total, but the trailing a or b of each of the others would be matched.

Note also, however, that when a flex scanner cannot match the input to any user-defined rule, its default rule is invoked, which copies the next input character to the standard output, consuming it. Therefore, if you present an input to your scanner that contains at least one non-digit at the end of a line, then it will eventually consume any preceding input up to the last digit, printing all of that on the standard output, before eventually matching that trailing portion of the line and printing "not a number".

John Bollinger
  • 160,171
  • 8
  • 81
  • 157
  • In (f)lex, `$` at the end of a pattern is *exactly* the same as the (f)lex trailing context operator subpattern `/\n`. Unlike many regex libraries, `$` does not match at the end of input; it only matches if there is at least one following character, which must be precisely a newline character (and not, for example, \r\n). So saying that it matches "at the end of a line" can be slightly misleading. While a text file *should end with a newline, and most do, (f)lex is also used to scan strings constructed in memory, and it's pretty common to not add a newline at the end. That can create surprise. – rici Nov 15 '21 at 20:27
  • Thank you, @rici, I have clarified the meaning of "at the end of a line". – John Bollinger Nov 15 '21 at 20:38