12

Trying to remove C-style comments and their delimiters. Seems straightforward but I'm spinning wheels. This is what I am trying:

$code =~ s/\/\*.+\*\//g;

However, the error I get is:

Substitution replacement not terminated

What to do?

Sinan Ünür
  • 116,958
  • 15
  • 196
  • 339
amphibient
  • 29,770
  • 54
  • 146
  • 240
  • 2
    possible duplicate of http://stackoverflow.com/questions/877470/how-can-i-strip-multiline-c-comments-from-a-file-using-perl – Tyler MacDonell Mar 22 '13 at 19:04
  • Because you haven't got enough slashes after you've accounted for the backslashes too. You probably need a non-greedy `.*?` in place of the `.+`, which rejects `/**/` as a C comment. Also remember that comments become spaces in the C preprocessor. See also Jeff Friedl's book [Mastering Regular Expressions](http://regex.info/) for detailed discussion of matching C comments. Note that if I write: `char c[] = "/*Not a comment*/";`, you should not be deleting the content of that string. You can also have (non-portably): `int c1 = '/*'; int c2 = '*/';` and there isn't any comment there, either. – Jonathan Leffler Mar 22 '13 at 19:09
  • yep `$doc =~ s#/\*.*?\*/##sg;` worked :) – amphibient Mar 22 '13 at 19:10
  • No, `$doc =~ s#/\*.*?\*/##sg;` doesn't work except in specific situations. Voting to close as duplicate because it looks like you are not content with just getting your typo fixed. If the topic is stripping comments from C source, then brian d foy's answer to the linked question is the right answer. – Sinan Ünür Mar 22 '13 at 19:12

3 Answers3

18

Substitution replacements take the form:

s/FIND/REPLACE/FLAGS

Note that the delimiters can be any of many characters. Therefore when dealing with patterns that include slashes, it is often better to use a different delimiter, say #.

Your substitution replacement here is missing the replace section. This is perhaps clearer if we use # instead of / as a delimiter:

$code =~ s#\/\*.+\*\/#g;

What you are probably intending to use is:

$code =~ s/\/\*.+\*\///g;

Or for clarity,

$code =~ s#/\*.+\*/##g;

Note that since # is being used as a delimiter instead, it is no longer necessary to escape /.

Jared Ng
  • 4,891
  • 2
  • 19
  • 18
2

You meant to write $code =~ s/\/\*.+\*\///g;

You don't have to use leaning toothpicks in your patterns:

$code =~ s{ /[*] .+ [*]/ }{}gx;

Although, you'll probably realize that this does not handle all possibilities well. For that, see How can I strip multiline C comments from a file using Perl?

Community
  • 1
  • 1
Sinan Ünür
  • 116,958
  • 15
  • 196
  • 339
  • just like Jared's anwser, this gave me only the part of the string AFTER the last comment ... not before. – amphibient Mar 22 '13 at 19:08
  • 1
    Of course it did: That's because `.+` is greedy. As I mentioned, your pattern will not work as intended. See brian d foy's answer to the linked question for the correct pattern. I changed the title (therefore the focus) of your question so it wouldn't be closed as a duplicate. – Sinan Ünür Mar 22 '13 at 19:10
  • What's distinctive about this question compared with the linked one? There's the trivial 'not enough slashes' part to the headline question (so it might be 'too localized'). All the deeper issues are covered to some extent in the other question. – Jonathan Leffler Mar 22 '13 at 19:18
  • @JonathanLeffler Well, it may not be distinctive enough but the new title addresses the actual problem shown in the question and provides an opening to show that one can use different pattern delimiters in Perl. Given that the OP is focusing on the stripping C comments part, I have already voted to close as a duplicate. – Sinan Ünür Mar 22 '13 at 19:21
1

Per this thread, the following worked:

$code =~ s#/\*.*?\*/##sg;

Community
  • 1
  • 1
amphibient
  • 29,770
  • 54
  • 146
  • 240
  • Not my down-vote, but that is a potentially dangerous over-simplification for comment removal. Strings, character constants, and 'replacement should be a space' are all factors that might lead to this being down-voted. – Jonathan Leffler Mar 22 '13 at 19:20
  • why is this more confusing than any other pattern match that doesn't include *s. i have done similar matches 100s of times and didn't involve convoluted regex (like remove all that is in () etc)?? – amphibient Mar 22 '13 at 19:25
  • 2
    Part of the art of using regexes is knowing the context in which it must be used, and the likely data. If your code is nice and simple and doesn't have weird corner cases, then that regex will do (though you'd still be safer with a space instead of nothing). It is not, however, a general purpose 'match C `/*...*/` comments only when they are comments' regex. I won't even bore you with backslash-newline appearing between the `/` and `*` or between `*` and `/`, or with trigraphs `??/` for ``\``, which are issues that a C compiler (preprocessor) has to deal with in support of the standard. – Jonathan Leffler Mar 22 '13 at 19:29