0

I'm attempting to remove C and C++ style comments from a string using a regular expression. I have found one for Perl that seems to do both:

s#/\*[^*]*\*+([^/*][^*]*\*+)*/|//([^\\]|[^\n][\n]?)*?\n|("(\\.|[^"\\])*"|'(\\.|[^'\\])*'|.[^/"'\\]*)#defined $3 ? $3 : ""#gse;

But I am unsure as to how to use this with a boost::regex code block, or what I need to do to transform it into a regular expression accepted by boost::regex.

FYI: I found the regular expression here: perlfaq6 and it seems to cover any case I would need.

I would prefer not to use boost::spirit::qi to do this, as it would add a great deal of time to compilation for the project.

EDIT:

std::string input = "hello /* world */ world";

boost::regex reg("(/\\*([^*]|(\\*+[^*/]))*\\*+/)|(//.*)");

input = boost::regex_replace(input, reg, "");

So the shorter regex does indeed work, however the longer one does not.

Alan Moore
  • 73,866
  • 12
  • 100
  • 156
nerozehl
  • 463
  • 1
  • 7
  • 19
  • 1
    Wouldn't it be easier to understand and maintain this by using two separate regexes? First, get rid of /* ... */ then get rid of // ... eol. – jmucchiello Feb 26 '12 at 02:13
  • 1
    @jmucchiello can't; they affect each other. – smparkes Feb 26 '12 at 02:19
  • 1
    The regular expression should work with Boost's regular expressions pretty much the same way it does with perl. Just create a `boost::regex` object and apply it to a `std::string` using `boost::regex_match()` or `boost::regex_search()` to get a `boost::smatch`. What exactly are you struggling with? – Dietmar Kühl Feb 26 '12 at 02:25
  • The format of the regular expression. I don't know if it will work exactly like that right out of the box. – nerozehl Feb 26 '12 at 02:37
  • @smparkes: Why on earth would one affect the other. Do /**/ first than // 2nd... –  Feb 26 '12 at 02:51
  • 2
    Also op. Why on earth would you do this? And why are you doing it in C++? -2 plz (note downvote, programming points :(. This is too weird) –  Feb 26 '12 at 02:53
  • Is the string you're modifying actually C or C++? If so I anticipate that you will not be able to make a regex that correctly matches comments the way a C or C++ parser will. – bames53 Feb 26 '12 at 03:04
  • @nerozehl: did you give it a try? If you are unsure, think about what the regular expression does and see if the components map 1-to-1 to Boost's notation. Also, Boost supports Perl regular expressions. What is your concrete problem? What failed? What code did you try? – Dietmar Kühl Feb 26 '12 at 03:05
  • 2
    Why don't you use the preprocessor to do this? `gcc -E`, etc... – johnsyweb Feb 26 '12 at 03:16
  • 1
    Boost.Regex is not capable of this; [Boost.Xpressive](http://www.boost.org/libs/xpressive/) is. – ildjarn Feb 26 '12 at 03:21
  • I'm stripping comments from a non C++ file that I would like to have comments in it that are in C/C++ format. – nerozehl Feb 26 '12 at 03:22
  • 2
    @acidzombie24: Of course they can interact. You can comment out a `/*` or a `*/` by using a `//`. – Oliver Charlesworth Feb 26 '12 at 03:34
  • @acidzombie24 e.g., `// /* \n a = 10; /* */` – smparkes Feb 26 '12 at 03:51
  • @OliCharlesworth can't really comment out a `*/`. – smparkes Feb 26 '12 at 03:53
  • @OliCharlesworth I think you're right. I never ever seen a ///*. Although i have seen a //*/ but // is always ignore bc of the /* earlier in the file. +1 on your comment –  Feb 26 '12 at 20:52
  • @smparkes, the issue of // /* is part of the regex for doing /* */ first. But that doesn't change the fact that doing them separately is probably easier to read. – jmucchiello Mar 09 '12 at 03:45

2 Answers2

3

It seems a bit strange that you would use a regex for this when boost already has a C++ preprocessor library (Boost.Wave) which can be used to strip comments.

std::string strip_comments(std::string const& input) {
    std::string output;
    typedef boost::wave::cpplexer::lex_token<> token_type;
    typedef boost::wave::cpplexer::lex_iterator<token_type> lexer_type;
    typedef token_type::position_type position_type;

    position_type pos;

    lexer_type it = lexer_type(input.begin(), input.end(), pos, 
        boost::wave::language_support(
            boost::wave::support_cpp|boost::wave::support_option_long_long));
    lexer_type end = lexer_type();

    for (;it != end; ++it) {
        if (*it != boost::wave::T_CCOMMENT
         && *it != boost::wave::T_CPPCOMMENT) {
            output += std::string(it->get_value().begin(), it->get_value().end());
        }
    }
    return output;
}
Mankarse
  • 39,818
  • 11
  • 97
  • 141
0

if

\*

becomes

\\* 

then why doesn't

[^\\] 

become

[^\\\\] 
user1227804
  • 390
  • 1
  • 5