0

How using regexp replace something in code that is not a comment?

..PATTERN.. PATTERN // .. PATTERN ..

to

..ANOTHER.. ANOTHER // .. PATTERN ..

Comments can be // or /* */

Regexp to find comments is:

/\*(.|[\r\n])*?\*/|(//.*)
Bart Kiers
  • 166,582
  • 36
  • 299
  • 288
newGuest
  • 33
  • 1
  • 3
  • 1
    Your regex to find a comment will mess up the following source: `string s = "... // ...";`. – Bart Kiers Jun 27 '11 at 19:54
  • @Gumbo, I would assume C# because it's tagged as C#. – Corey Ogburn Jun 27 '11 at 19:55
  • @Corey Ogburn, yeah, probably, but the OP could be processing C# code with some other tool/language... – Bart Kiers Jun 27 '11 at 19:56
  • To be more precise: What language do you want to remove these comments from? – Gumbo Jun 27 '11 at 19:56
  • C# or C++ . Comments can be find by /\*(.|[\r\n])*?\*/|(//.*) as described in this article: http://ostermiller.org/findcomment.html – newGuest Jun 27 '11 at 20:04
  • to be more concretic I want to remove namespaces before types, that declared in "using", but live it in comments. This I need to properly compare code with another version of it (SVN) – newGuest Jun 27 '11 at 20:07
  • Take a look at [my answer to a similar question](http://stackoverflow.com/questions/4278739/regular-expression-for-clean-javascript-comments-of-type/4278816#4278816). This might also be applicable to C#/C++. – Gumbo Jun 27 '11 at 20:09
  • @newGuest, that article ends with a "Caveats" paragraph explaining when it doesn't work (as my example `string s = "... // ...";` clearly shows) – Bart Kiers Jun 27 '11 at 20:26
  • I remember tackling this problem in my compiler construction class....it wasn't exactly trivial. We couldn't do this with regexp due to the issue Bart described. We needed to define states in the lexer...and nested comments were an issue. – Roly Jun 27 '11 at 20:29

2 Answers2

0

Just do a simple two-state scanner (REGULAR_TOKEN, COMMENT_TOKEN, ....)

Then for each REGULAR_TOKEN do a straightforward replace, and leave the comment_tokens. Once again I recommend Boost Spirit/

If you specified your goal/problem more, I might whip up an example

sehe
  • 374,641
  • 47
  • 450
  • 633
  • here is my code exemple: protected List mThumbs = new List(); //comments: Mogre.OverlayContainer is for bla-bla-bla – newGuest Jun 27 '11 at 20:13
  • with this simple regex (.*?)Mogre\. I can remove namespace, but it also remove it from comment – newGuest Jun 27 '11 at 20:16
0

You can easily do this for single line comments by using a negative lookbehind (but still prone to problem of literal string of the form "....//....."):

string target = "replace // don't replace";
var output = Regex.Replace(target, "(?<!//.*)replace", "new string");
Console.WriteLine(output); //  new string // don't replace

Maybe this could work for multiline:

string target = 
    @"replace;
/*
* don't replace
*/
replace;
    ";

var output = Regex.Replace(target, @"(?<!./\*\s*)replace(?!\s*\*/)", "new string", RegexOptions.Singleline);

Console.WriteLine(output); 

output:

new string; /* * don't replace */ new string;

Roly
  • 1,516
  • 1
  • 15
  • 26
  • thank you, I must read about advanced regex assertions. But your multiline sample will be broken after adding "* don't replace SOMETHING". Also if you dont know this is online test http://regexhero.net/tester/ with ability of generate .NET code – newGuest Jun 27 '11 at 21:05