1

I´m using the following REGEXP:

$output = preg_replace( "/\/\/(.*)\\n/", "", $output );

The code works well BUT!!!!, when a URL like (http://this_is_not_a_comment.com/kickme), the code replaces it... (http://)

What can you do to no replace that URLs.

Thanks,

CRISHK Corporation
  • 2,948
  • 6
  • 37
  • 52
  • 2
    You need some kind of parser that can distinguish between code and comments. – Gumbo Nov 25 '10 at 15:46
  • 2
    You should look at this answer: http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454 – Don Roby Nov 25 '10 at 15:53

2 Answers2

8

You need a regular expression that can distinguish between the code and the comments. In particular, since the sequence of // can either be in a string or a comment, you just need to distinguish between strings and comments.

Here’s an example that might do this:

/(?:([^\/"']+|\/\*(?:[^*]|\*+[^*\/])*\*+\/|"(?:[^"\\]|\\.)*"|'(?:[^'\\]|\\.)*')|\/\/.*)/

Using this in a replace function while replacing the matched string with the match of the first subpattern should then be able to remove the // style comments.

Some explanation:

  • [^/"']+ matches any character that is not the begin of a comment (both //… and /*…*/) or of a string
  • /\*(?:[^*]|\*+[^*/])*\*+/ matches the /* … */ style comments
  • "(?:[^"\\]|\\.)*" matches a string in double quotes
  • '(?:[^'\\]|\\.)*' matches a string in single quotes
  • \/\/.* finally matches the //… style comments.

As the first three constructs are grouped in a capturing group, the matched string is available and nothing is changed when replacing the matched string with the match of the first subpattern. Only if a //… style comment is matched the match of the first subpattern is empty and thus it’s replaced by an empty string.

But note that this may fail. I’m not quite sure if it works for any input.

Gumbo
  • 643,351
  • 109
  • 780
  • 844
  • 4
    Protip to OP: if a regular expression ever looks this hideous, it's probably not a job for regular expressions. Regardless, +1 for being able to even begin to construct something like this. – Matchu Nov 25 '10 at 15:57
  • @Matchu: I had to lookup the regular expression for the `/* … */` style comments too. – Gumbo Nov 25 '10 at 16:03
  • Nice. Took me some time to get everything. In the two strings, I think the `\.` should be escaped quotes - `\\"`, or escape anything: `\\.`. Am I missing something? – Kobi Nov 25 '10 at 16:14
  • @Kobi: Oh yes, of course. Thanks! I just simplified it instead of checking for a valid string syntax too. – Gumbo Nov 25 '10 at 16:15
  • 1
    If this is for js you'd also need to think of regex quoting, eg `/foo\//i` – Qtax May 13 '11 at 09:01
  • 1
    This solution fails to consider literal regexes (which need to be considered when parsing JavaScript), e.g. it will mangle: `var re = /\/*notacomment!*/;` and `m = /\//.test("notacomment!")` and `var re = /\/*/; // */ thiscommentishandledasascode!` and `var re = /"/; // " thiscommentishandledasascode!` – ridgerunner Aug 14 '13 at 13:39
5
$output = preg_replace( "/(?<!\:)\/\/(.*)\\n/", "", $output );
CRISHK Corporation
  • 2,948
  • 6
  • 37
  • 52