1

I am in the process of adapting the System Verilog LRM into Antlr4. This is a huge overkill for what I really need, however. Basically I need dependency analysis similar to the -M switch in gcc. This problem has been surprisingly difficult to solve, and my current regex based solution is incomplete, buggy and constantly breaks when exposed to new code, even though it has been patched many times. I have tried to use various freely available parsers, but none of them seem to handle code that conforms to the latest Systemverilog (2012) standard.

I think I need a parser based approach, and I think I am stuck building my own parser. But I am very interested to hear any other suggestions about this. I can't be the only one who has this problem.

Here is my Antlr question: I am attempting to use the "Island in the stream" approach where the Antlr grammar will ignore most of the details and complexity of the Systemverilog language and only parse code where modules are being instanced or headers are being referenced. Obviously the difficulty here is determining how to distinguish between code I care about and code I don't. Has anyone used Antlr this way (not necessarily for Systemverilog)? I am hoping to get a strategy about how to write the "catch all" rule that matches everything that is not related to module instances.

Thanks.

cdixit2
  • 192
  • 8
sean
  • 61
  • 7
  • That was one of the free options that I tried. It would have been perfect, except that not a single one of my testbenches would parse correctly. The errors made no sense and seemed to indicate problems conforming to the standard rather than in my code. – sean Jun 05 '15 at 01:43
  • I may do that. The trouble is this doesn't seem to indicate a single problem that is likely to be fixed anytime soon. And I don't have a lot of parser or perl experience, so fixing that code is not necessarily going to be easier than coming up with a new solution from scratch. – sean Jun 05 '15 at 16:38

1 Answers1

0

The idiomatic strategy is to match what is wanted and let everything else be consumed by an 'other' rule. So the basic structure of the parser will be:

verilog     : statement+ EOF ;
statement   : header
            | module
            | <<etc>>
            | other
            ;
header      : INCLUDE filePathspec SEMI ;
filePathspec: <<whatever>> ;
module      : MODULE <<whatever>> SEMI ;
other       : . ;  // consume a single, uninteresting token at a time

The only requirement is to make the statement rules sufficiently detailed to uniquely match their statements. The Verilog syntax gives you that explicitly.

UPDATE

Take a look at the example Verilog grammar is in the grammar achieve.

GRosenberg
  • 5,843
  • 2
  • 19
  • 23
  • I appreciate the response. But unfortunately, I think the 'other' rule will have to be much more complex, or may not work at all. Many of constructs that I want to skip do not end with a semicolon, and further, there are lots of constructs that look like a module instance, but actually contain keywords. I know from my sad experience of using a regex to do this matching. My current plan is to attempt to reformat the grammar posted in the LRM and comment out the stuff I don't need. – sean Jun 08 '15 at 14:34
  • You misunderstand how the other rule works - it is not dependent on a semicolon - it consumes a single token at a time, subject to another rule matching in its entirety. Similar but distinguishable constructs are just that - distinguishable. You just have to write the rule with sufficient detail to make the distinction. You might want to post an example of similar constructs to see how Antlr can distinguish between them (Antlr is really quite different from regexs). – GRosenberg Jun 08 '15 at 17:14
  • I am finally getting back to this. And I don't think my original strategy will work. I don't see how I can force the parser to ignore stuff that are not dependencies without including all the other stuff that are not dependencies. So I am attempting to put the whole System Verilog LRM into the Antlr4 syntax. – sean Jun 24 '15 at 22:37