How to disable parsing for a piece of text in a file?

Question

Structure of my file is :

`pragma TOKEN1_NAME TOKEN1_VALUE
`pragma TOKEN2_NAME TOKEN2_VALUE
`pragma TOKEN3_NAME TOKEN3_VALUE
`pragma TOKEN4_NAME TOKEN4_VALUE
 VHDL_TEXT{

 // A valid VHDL text goes here.
}
`pragma TOKEN2_NAME TOKEN2_VALUE
 VHDL_TEXT{

 // VHDL text
}

I need to pass VHDL text as it is to the output file.I can do that by making a default rule at the end of lex file as:

Rule:  .    { append_to_buffer(*yytext); }

I also have list of other rules in my Lex file to deal with the tokens.

The problem i am having is how to deal with the situation in which VHDL text is also containing some of the tokens that can be recognized by the Lex rules?

In other words ,i want to disable detecting further valid token one i found the text i am interesting in and again start detection once it is over.

If you're using `flex` then read about [start conditions](http://flex.sourceforge.net/manual/Start-Conditions.html#Start-Conditions). — Some programmer dude, Jan 04 '15 at 09:24
Which version of VHDL do you want to parse? There are some rules ets for older versions available. http://tams-www.informatik.uni-hamburg.de/vhdl/index.php?content=07-tools#grammar — Paebbels, Jan 04 '15 at 12:54
@Paebbels It is not directly related to VHDL even .It may be any HDL description. — Ankur Gautam, Jan 04 '15 at 13:12
What you need to be asking yourself (before you ask the rest of us :) ) is "how do I recognized the end of the VHDL text?" Without an answer to that question, it is impossible to write a computer program which answers it, in any language. — rici, Jan 04 '15 at 14:42
@rici I understand your concern.but it is not about detection.It is not particularly VHDL.It is something which is embedded in two pragmas `pragma begin and `pragma end and i need to get all the text between those two pragmas. Please assume,i can easily detect the starting and end of that text.Now the thing i want is how can i disable the rest of the rules by the time i am doing lexical analysis of that text ? — Ankur Gautam, Jan 04 '15 at 16:17
@AnkurGautam: If you can detect the end of the text, then just write a regular expression which precisely matches the text up to the end. Then there is no problem. I don't see either `pragma begin` or `pragma end` in your example, and the example leads me to believe that the end of the text is marked by a `}` which matches the opening `{`, but in any event you need to be more precise. — rici, Jan 04 '15 at 16:20
@AnkurGautam: And, by the way, lexical scanning is *precisely* about detecting the end of scanned tokens. By definition. — rici, Jan 04 '15 at 16:21

score 2 · Answer 1 · answered Jan 04 '15 at 16:37

As rici points out indirectly you need to be able to distinguish between occurrences of the trailing delimiter '}' for your rule and occurrences of the right curly bracket in a valid VHDL design specification or portion.

See IEEE Std 1076-2008, 15.3 Lexical elements, separators, and delimiters where we find that '{' and '}' are not used as delimiters in VHDL.

They are other special characters (15.2 Character set, using ISO/IEC 8859-1:1998) requiring handling where graphic characters may appear.

graphic_character ::=
    basic_graphic_character | lower_case_letter | other_special_character

These include extended identifiers (15.4.3), character literals (15.6), string literals (15.7), bit string literals (15.8), comments (15.9) and tool directives (15.11).

There's a need to identify these lexical elements within the production otherwise identifying '}' as a delimiter for the rule.

Only one tool directive is currently defined (24.1 Protect tool directives) wherein the use of the two curly bracket characters would be contained in VHDL lexical elements. All other uses in lexical elements are directly delimited. (And you could disclaim tool directive support, in VHDL they basically also invoke separate lexical, syntactical and semantic analysis).

Essentially you need to operate a VHDL lexical analyzer for traversing 'VHDL text' where you're rule delimiter right curly bracket will stand out like a sore thumb (as an exception, serving as the closing delimiter for VHDL text).

And about now you'd get the idea life would be easier if you could deal with VHDL by reference instead if possible. Your mechanism is as complex as including tool directives in VHDL (which can be done with a preprocessor as could your VHDL text).

This is in response to the vhdl tag added by FUZxxl.

Chris Dodd · Accepted Answer · 2015-01-04T19:16:16.127

1

When you have essentially different languages in a source file that you need to deal with that have clear demarcation tokens (like your VHDL_TEXT markers) that can be easily recognized by the lexer, the easiest thing to do is to use flex exclusive start states (%x). In your case, you would do something like:

%{
/* some global vars for holding aux state */
static int brace_depth;
static Buffer vhdl_text;
%}

%x VHDL

%%

.. normal lexer rules for your non-vhdl stuff

VHDL_TEXT[ \t]*{    { brace_depth = 1;
                      BufferClear(vhdl_text);
                      BEGIN(VHDL); }
<VHDL>"{"           { BufferAppend(vhdl_text, *yytext);
                      brace_depth++; }
<VHDL>"}"           { if (--brace_depth == 0) {
                          BEGIN(INITIAL);
                          yylval.buf = BufferExtract(vhdl_text);
                          return VHDL_TEXT; }
                      BufferAppend(vhdl_text, *yytext); }
<VHDL>--.*\n        { BufferAppendString(vhdl_text, yytext); }
<VHDL>\"[^"\n]\"    { BufferAppendString(vhdl_text, yytext); }
<VHDL>\\[^\\\n]\\   { BufferAppendString(vhdl_text, yytext); }
<VHDL>.|\n          { BufferAppend(vhdl_text, *yytext); }

This will gather up everything between the curly braces in VHDL_TEXT {...} and return it to your parser as a single token (matching nested braces properly, if there are any in the VHDL text.) You can do macro substitution-like stuff in the VHDL code by adding a rule like:

<VHDL>{IDENT}       { if (Macro *mac = lookup_macro_by_name(yytext)) {
                          BufferAppendString(vhdl_text, mac->replacement);
                      } else {
                          BufferAppendString(vhdl_text, yytext); } }

You also probably want a <VHDL><<EOF>> rule to detect a missing closing } on the vhdl text and give an appropriate error message.

edited Jan 04 '15 at 19:16

answered Jan 04 '15 at 17:00

Chris Dodd

119,907
13
134
226

Brace depth is likely not sufficient. None of the VHDL lexical uses of curly brackets have any requirement for matching pairs. They are not VHDL delimiters. – Jan 04 '15 at 17:05
@DavidKoontz: Since VHDL apparently doesn't use braces at all, counting brace depth is probably unnecessary for VHDL, but is useful for other languages. – Chris Dodd Jan 04 '15 at 19:17
No you missed the point. In VHDL 'braces' can be present in lexical elements extended_identifier, e.g. \name}name\, character_literal e.g. '}', string_literal e.g. "}", bit_string_literal e.g. "1}0" or tool_directive (preprocessor where some other lexical element is present containing closing 'brace'). The 'brace' characters are not safe 'VHDL text' delimiters without tokenizing VHDL text accurately. – Jan 04 '15 at 20:43
@DavidKoontz: True, but such uses of braces are generally non-sensical, so you can simply disallow such uses and say 'valid vhdl without spurious braces'. – Chris Dodd Jan 05 '15 at 00:03
I found 43 VHDL source files on my laptop containing one or more right curly brackets out of 3,956. As in a comment -- KEY SCHEDULE: {1,1,2,2,2,2,2,2,1,2,2,2,2,2,2,1}. Seems common to show set associativity. Instead of proclaiming Fizzbin rules you could protect the VHDL text. Double on right curly brackets, reduce them at point of use. A single curly right bracket is now an end of VHDL text marker. You could count the character length and specify that, ... A VHDL design specification can only be determined to be valid where it can be analyzed, elaborated and simulated (or synthesized). – Jan 05 '15 at 00:32
Simple, sensible things like balanced braces, comments and strings are easily dealt with (as above). Worrying about weird corner cases that are rarely or never significant is generally counter-productive. – Chris Dodd Jan 05 '15 at 01:13

How to disable parsing for a piece of text in a file?

2 Answers2