2

I'm trying to use ActiveState TCL on a windows PC to run the following TCL. looks like i'm doing a non-greedy match between #\(.*?\) and its matching greedy into the next statements... Any idea what i'm doing wrong or how to fix this?


proc extract_verilog_instances {text} {

    set rexp {(\w+)\s+(\#\s*\((?:.*?)\)\s*)?(\w+(?:\[\d+\])?)\s*\(}

    # rexp will match any of the following statement types:
    #
    #   module_name instance_name ( 
    #   module_name instance_name[0] (
    #   module_name #(parameter1, parameter2) instance_name (
    #   module_name #(parameter1, parameter2) instance_name[0] (


    set regrun [regexp -inline -all -indices -expanded $rexp $text]

    foreach {m0 m1 m2 m3} $regrun {
        set start_index    [lindex $m0 0]
        set end_index      [lindex $m0 1]
        set module   [string range $text [lindex $m1 0] [lindex $m1 1]]
        set instance [string range $text [lindex $m3 0] [lindex $m3 1]]

       puts "module:$module instance:$instance"
    }
}

set vlog {
    
    second_module #(2) inst2 (.in2(sig2), .out2(sig3));

    third_module inst3 (.in3(sig3), .out3(sig4));

    fourth_module #(.in4_clk_freq(50), .in4_rst_val(1'b0)) inst4 (.in4_clk(clk), .in4_rst(rst), .in4_in1(sig4), .in4_in2(sig5), .out4(sig6));
}

extract_verilog_instances $vlog

proc extract_verilog_instances5 $vlog

Expected output:

module:second_module instance:inst2
module:third_module instance:inst3
module:forth_module instance:inst4

Actual output:

module:second_module instance:inst4
toolic
  • 57,801
  • 17
  • 75
  • 117
Bill Moore
  • 165
  • 4

1 Answers1

2

You can use

(\w+?)\s+(#\s*\(.*\)\s*)?(\w+(?:\[\d+\])?)\s*\(

In a Tcl regex, greediness is set with the first quantifier in the pattern. So, if you use \w+? as the first quantified subpattern, all subsequent patterns with + or * will automatically turn into +? and *?.

If you want to test this regex in a PCRE compliant regex tester, the pattern above should be written as

(\w+?)\s+?(#\s*?\(.*?\)\s*?)?(\w+?(?:\[\d+?\])??)\s*?\(

See the regex demo.

This regex works for you because \w+? at the start of the pattern will work the same as \w+ because it is followed with an obligatory \s, and all the rest lazy patterns work because of the obligatory patterns following them (\( is very good and important here).

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563