1

I am working on a preprocessor that is analyzing a DSL. My goal is to remove the comments. The block comment facility is demarcated by %% before and after. I do not have to worry about %% being in strings, by the definition of the language.

I am using this s/// regex. Unfortunately, it seems to match everything and wipe it out:

#Remove multiline comments.
$text_string =~ s/%%.*%%//msg;

What am I doing wrong?

brian d foy
  • 129,424
  • 31
  • 207
  • 592
Paul Nathan
  • 39,638
  • 28
  • 112
  • 212

3 Answers3

9

the first thing you can do is make it non-greedy:

.*?

otherwise,

%% some text %%

real content

%% other text %%

will all be wiped out.

nonopolarity
  • 146,324
  • 131
  • 460
  • 740
1

From perlfaq6: What does it mean that regexes are greedy? How can I get around it?


Most people mean that greedy regexes match as much as they can. Technically speaking, it's actually the quantifiers (?, *, +, {}) that are greedy rather than the whole pattern; Perl prefers local greed and immediate gratification to overall greed. To get non-greedy versions of the same quantifiers, use (??, *?, +?, {}?).

An example:

$s1 = $s2 = "I am very very cold";
$s1 =~ s/ve.*y //;      # I am cold
$s2 =~ s/ve.*?y //;     # I am very cold

Notice how the second substitution stopped matching as soon as it encountered "y ". The *? quantifier effectively tells the regular expression engine to find a match as quickly as possible and pass control on to whatever is next in line, like you would if you were playing hot potato.

brian d foy
  • 129,424
  • 31
  • 207
  • 592
0

assuming that you have read entire code into the variable $str and between %% and %% there is no possibility of a single % occuring, you could use this.

$str =~ s/%%([^%]+)%%//g;