0

I wrote a method to remove single line comments from a C++ source file:


def stripRegularComments(text) {
  def builder = new StringBuilder()
  text.eachLine {
   def singleCommentPos = it.indexOf("//")
   def process = true
   if(singleCommentPos > -1)
   {
    def counter = 0
    it.eachWithIndex 
    { obj,i ->
     if((obj == '\'') || (obj == '"'))
      counter++
     if(i == singleCommentPos)
     {
      process = ((counter % 2) == 1)
      if(!process)
       return
     } 
    }

if(!process)
{
 def line = it.substring(0,singleCommentPos)
 builder << line << "\n"
}
else
{
 builder << it << "\n" 
}

} else { builder << it << "\n" } } return builder.toString() }

And I tested it with:

println a.stripRegularComments("""
this is a test inside double quotes "//inside double quotes"
this is a test inside single quotes '//inside single quotes'
two// a comment?//other
single //comment
""")

It produces this output:

this is a test inside double quotes "//inside double quotes"
this is a test inside single quotes '//inside single quotes'
two
single

Are there some cases I'm missing?

Opal
  • 81,889
  • 28
  • 189
  • 210
Geo
  • 93,257
  • 117
  • 344
  • 520
  • 1
    Those are C++ comments. C uses `/*` and `*/` to delimit commented sections. – D.Shawley Oct 24 '09 at 16:19
  • Just a tip; you might want to look into Regular Expressions – sharkin Oct 24 '09 at 16:25
  • I don't understand.. The only case for // is from // to \n. What else could there be? – mk12 Oct 24 '09 at 16:31
  • 1
    And while you look at regexps, look at Perl. Perl is extremely powerful and easy when it comes to creating text manipulation scripts. – rsp Oct 24 '09 at 16:32
  • @Geo: if you are doing this for production use (say as part of a build system), you might want to look into writing a quick parser. Since you seem to be inclined to java, I would recommend taking a look at http://www.antlr.org/. Also look at all of the responses to http://stackoverflow.com/questions/877470/how-can-i-strip-multiline-c-comments-from-a-file-using-perl for some more ideas. – D.Shawley Oct 24 '09 at 18:12
  • 1
    I'd just like to know WHY you're stripping comments out of source code. It doesn't seem like that good an idea. I mean I think we'd all agree that comments in source are a good thing and should be encouraged. – Glen Oct 24 '09 at 18:17
  • @D.Shawley: C99 adopted C++-style comments. – AnT stands with Russia Oct 24 '09 at 18:34
  • I'm removing comments because I don't want to implement a full-scale parser. I need to do some light parsing on source files in order to generate some reports. – Geo Oct 24 '09 at 19:09
  • There are already many C++ parsers available - have a look at the ANTLR project. You need a parser to do reliable comment processing. – PoorLuzer Oct 24 '09 at 22:55
  • The nice thing about parsing is that you don't need to parse much more than trigraphs, digraphs, quoted strings, and comments to strip them. I agree that it isn't as easy as it could be, but I think that you are already finding out that a general purpose solution ends up with you writing a parser anyway since it is difficult to catch all of the cases with naive string parsing or REs. – D.Shawley Oct 25 '09 at 12:04
  • C++ doesn't have single quoted strings, you can remove that test. – Motti Oct 25 '09 at 12:19

7 Answers7

11

The fun ones are formed by trigraphs and line continuations. My personal favorite is:

/??/
* this is a comment *??/
/
D.Shawley
  • 58,213
  • 10
  • 98
  • 113
  • 2
    I haven't seen that in a source file yet. – Geo Oct 24 '09 at 16:25
  • @aviraldg: I hope no one would ever do such a thing, but it is within the Standard so it is legal. – D.Shawley Oct 24 '09 at 18:01
  • @Geo: oops... yes... this is a multiline since it relies on line continuation (backslash). I've found accidental line continuation using `??/` at the end of a comment before - basically, `// a comment??/` will comment out the following line. Technically a multiline comment, but completely by accident. – D.Shawley Oct 24 '09 at 18:05
  • You have a space character before the first `*`, which ruins the whole thing :) That space character should be removed for your comment to work as intended – AnT stands with Russia Oct 24 '09 at 19:20
  • @Andrey: D'oh... I really wish MarkDown didn't require the extra level of indenting for code. It's too easy to do that. Thanks. – D.Shawley Oct 25 '09 at 12:00
10
// Single line comments can\
actually be multi line.
Pumpuli
  • 381
  • 1
  • 2
  • 4
  • 7
    I like how the syntax highlighter failed on this one. – Kawa Oct 24 '09 at 19:28
  • 1
    @Kawa: that’s a limitation of the flawed approach to syntax highlighting that Stack Overflow is taking. :-( Even for much more conventional code the results are flaky. – Konrad Rudolph Oct 25 '09 at 12:17
6

I think you can't handle

  puts("Test \
    // not a comment");

and this is also likely to make problems:

  puts("'"); // this is a comment
Rüdiger Hanke
  • 6,215
  • 2
  • 38
  • 45
5

You don't seem to handle escaped quotes, like:

"Comment\"//also inside string"

versus

"Comment"//not inside string"
rsp
  • 23,135
  • 6
  • 55
  • 69
2

I think you are missing the /* comment */ case.

Darin Dimitrov
  • 1,023,142
  • 271
  • 3,287
  • 2,928
  • I know, I'm planning on doing that after I make sure I've covered the cases for // – Geo Oct 24 '09 at 16:22
1

The handling of \ character at the end of the line is performed at the earlier translation phase (phase 2) than replacement of comments (phase 3). For this reason, a // comment can actually occupy more than one line in the original source file

// This \
whole thing \
is actually \
a single comment

P.S. Oh... I see this is already posted. OK, I'll keep it alive just for mentioning phases of translation :)

AnT stands with Russia
  • 312,472
  • 42
  • 525
  • 765
-1

This is always a favourite:

// Why doesn't this run?????????????????????/
foo(bar);
me22
  • 651
  • 3
  • 8