0

I'm having a perl script, which looks for example as follows:

#/usr/bin/perl -w

print 'My output: ';

print <<END;
Here is more content 
which is printed with
heredoc style
END

print 'End of output';

Now I wish to extract the content of the above heredoc printing with JavaScript. The result should look like the following:

<<END;
Here is more content 
which is printed with
heredoc style
END

I've tried it with <<END(.|\n)*END. This is working if the document only contains one heredoc but not if it contains more than one heredoc.

So for example if my perl script looks like the following:

#/usr/bin/perl -w

print 'My output: ';

print <<END;
Here is more content 
which is printed with
heredoc style
END

print <<END;
Here is even more content 
which is printed with
heredoc style
END

print 'End of output';

The regex matches to:

<<END;
Here is more content 
which is printed with
heredoc style
END

print <<END;
Here is even more content 
which is printed with
heredoc style
END

But it should match to

<<END;
Here is more content 
which is printed with
heredoc style
END

and

<<END;
Here is even more content 
which is printed with
heredoc style
END

Does anyone have an idea, what is wrong with my regex?

Another question: Is it possible with regex only, to catch all heredocs where it's not specified to the heredoc string END?

Benjamin J.
  • 1,239
  • 1
  • 15
  • 28

1 Answers1

2

The problem is that * is "greedy" by default. * captures all it can match until the pattern prior to * fails. Only then does it return. In your case, the pattern is valid all the way up to the end of your string.

To prevent it from being greedy and check if it passed a point where it should end (see what I did there? :D), add ? after *.

<<END(.|\n)*?END
Joseph
  • 117,725
  • 30
  • 181
  • 234