0

I want to match multiline comments that contain a specific word, let's say findthis. The first pattern that comes to mind is \/\*.*?findthis.*?\*\/ (using DOTALL). The problem with this pattern however is that a string like this:

/* this is a comment */
this is some text
/* this is a findthis comment */

will match the whole text. Basically, on a bigger file, the first match would contain everything from the first comment to the first comment containing findthis. How can I prevent this?

Felix
  • 88,392
  • 43
  • 149
  • 167
  • 1
    you might want to read http://stackoverflow.com/questions/1618419/can-you-give-me-examples-of-odd-single-line-comments-in-c/ – D.Shawley Dec 10 '09 at 17:41

2 Answers2

2

Well, you could change the regex to something like \/\*([^*]|\*+[^/*])*findthis([^*]|\*+[^/*])*\*+\/ but...

To get this exactly right, you would have to fully tokenize the source code. Otherwise your regex will be fooled by comment-like content inside strings (among other bizarre corner cases).

(Explanation of crazy regex: ([^*]|\*+[^/*]) matches a little bit of the inside of a comment, but never matches all or part of */.)

Jason Orendorff
  • 42,793
  • 6
  • 62
  • 96
0

I think this should do the trick:

/\/\*.*?findthis.*?\*\//. The ? in the .*? part means ungreedy. In this way the comment can contain * and / chars, but not */ (the end of the comment)

VDVLeon
  • 1,393
  • 1
  • 15
  • 26
  • That's exactly the same pattern that I posted (with two additional slashes at the beginning and end - because you are probably a PHP user). Have you tried this pattern on the example I provided? It will not work. – Felix Dec 11 '09 at 13:34