Removing C/C++ style comments using boost::regex

Question

I'm attempting to remove C and C++ style comments from a string using a regular expression. I have found one for Perl that seems to do both:

s#/\*[^*]*\*+([^/*][^*]*\*+)*/|//([^\\]|[^\n][\n]?)*?\n|("(\\.|[^"\\])*"|'(\\.|[^'\\])*'|.[^/"'\\]*)#defined $3 ? $3 : ""#gse;

But I am unsure as to how to use this with a boost::regex code block, or what I need to do to transform it into a regular expression accepted by boost::regex.

FYI: I found the regular expression here: perlfaq6 and it seems to cover any case I would need.

I would prefer not to use boost::spirit::qi to do this, as it would add a great deal of time to compilation for the project.

EDIT:

std::string input = "hello /* world */ world";

boost::regex reg("(/\\*([^*]|(\\*+[^*/]))*\\*+/)|(//.*)");

input = boost::regex_replace(input, reg, "");

So the shorter regex does indeed work, however the longer one does not.

Wouldn't it be easier to understand and maintain this by using two separate regexes? First, get rid of /* ... */ then get rid of // ... eol. — jmucchiello, Feb 26 '12 at 02:13
The regular expression should work with Boost's regular expressions pretty much the same way it does with perl. Just create a `boost::regex` object and apply it to a `std::string` using `boost::regex_match()` or `boost::regex_search()` to get a `boost::smatch`. What exactly are you struggling with? — Dietmar Kühl, Feb 26 '12 at 02:25
The format of the regular expression. I don't know if it will work exactly like that right out of the box. — nerozehl, Feb 26 '12 at 02:37
@smparkes: Why on earth would one affect the other. Do /**/ first than // 2nd... — , Feb 26 '12 at 02:51
Also op. Why on earth would you do this? And why are you doing it in C++? -2 plz (note downvote, programming points :(. This is too weird) — , Feb 26 '12 at 02:53
Is the string you're modifying actually C or C++? If so I anticipate that you will not be able to make a regex that correctly matches comments the way a C or C++ parser will. — bames53, Feb 26 '12 at 03:04
@nerozehl: did you give it a try? If you are unsure, think about what the regular expression does and see if the components map 1-to-1 to Boost's notation. Also, Boost supports Perl regular expressions. What is your concrete problem? What failed? What code did you try? — Dietmar Kühl, Feb 26 '12 at 03:05
Why don't you use the preprocessor to do this? `gcc -E`, etc... — johnsyweb, Feb 26 '12 at 03:16
Boost.Regex is not capable of this; [Boost.Xpressive](http://www.boost.org/libs/xpressive/) is. — ildjarn, Feb 26 '12 at 03:21
I'm stripping comments from a non C++ file that I would like to have comments in it that are in C/C++ format. — nerozehl, Feb 26 '12 at 03:22
@acidzombie24: Of course they can interact. You can comment out a `/*` or a `*/` by using a `//`. — Oliver Charlesworth, Feb 26 '12 at 03:34
@OliCharlesworth I think you're right. I never ever seen a ///*. Although i have seen a //*/ but // is always ignore bc of the /* earlier in the file. +1 on your comment — , Feb 26 '12 at 20:52
@smparkes, the issue of // /* is part of the regex for doing /* */ first. But that doesn't change the fact that doing them separately is probably easier to read. — jmucchiello, Mar 09 '12 at 03:45

Mankarse · Answer 1 · 2012-10-05T01:35:25.193

It seems a bit strange that you would use a regex for this when boost already has a C++ preprocessor library (Boost.Wave) which can be used to strip comments.

std::string strip_comments(std::string const& input) {
    std::string output;
    typedef boost::wave::cpplexer::lex_token<> token_type;
    typedef boost::wave::cpplexer::lex_iterator<token_type> lexer_type;
    typedef token_type::position_type position_type;

    position_type pos;

    lexer_type it = lexer_type(input.begin(), input.end(), pos, 
        boost::wave::language_support(
            boost::wave::support_cpp|boost::wave::support_option_long_long));
    lexer_type end = lexer_type();

    for (;it != end; ++it) {
        if (*it != boost::wave::T_CCOMMENT
         && *it != boost::wave::T_CPPCOMMENT) {
            output += std::string(it->get_value().begin(), it->get_value().end());
        }
    }
    return output;
}

score 0 · Answer 2 · answered Feb 26 '12 at 03:28

0

if

\*

becomes

\\*

then why doesn't

[^\\]

become

[^\\\\]

answered Feb 26 '12 at 03:28

user1227804

390
1
5

Removing C/C++ style comments using boost::regex

2 Answers2