2

Can we format a std::regex string with whitespace/linebreak which get ignored - just for better reading? Is there any option available like in Python VERBOSE)?

Without verbose:

charref = re.compile("&#(0[0-7]+"
                     "|[0-9]+"
                     "|x[0-9a-fA-F]+);")

With verbose:

charref = re.compile(r"""
 &[#]                # Start of a numeric entity reference
 (
     0[0-7]+         # Octal form
   | [0-9]+          # Decimal form
   | x[0-9a-fA-F]+   # Hexadecimal form
 )
 ;                   # Trailing semicolon
""", re.VERBOSE)
Viatorus
  • 1,804
  • 1
  • 18
  • 41
  • I don't think so. You could use a raw string literal and pass it to another function that strips out its whitespace and then compiles it into a regex, but you'd have to write that stripping function yourself. – Cornstalks Jun 10 '16 at 14:16
  • 1
    You can split the string literal into multiple lines, like you show in your first example. You can have comments on those lines. – Igor Tandetnik Jun 10 '16 at 14:30

2 Answers2

8

Simply split the string into multiple literals and use C++ comments like so:

std::regex rgx( 
   "&[#]"                // Start of a numeric entity reference
   "("
     "0[0-7]+"           // Octal form
     "|[0-9]+"           // Decimal form
     "|x[0-9a-fA-F]+"    // Hexadecimal form
   ")"
   ";"                   // Trailing semicolon
);

They will then be combined to "&[#](0[0-7]+|[0-9]+|x[0-9a-fA-F]+);" by the compiler. This will also allow you to add whitespaces to the regex which won't be ignored. However the additional quotation marks can make this a little bit laborious to write.

muXXmit2X
  • 2,745
  • 3
  • 17
  • 34
5
inline std::string remove_ws(std::string in) {
  in.erase(std::remove_if(in.begin(), in.end(), std::isspace), in.end());
  return in;
}

inline std::string operator""_nows(const char* str, std::size_t length) {
  return remove_ws({str, str+length});
}

now, this doesn't support # comments, but adding that should be easy. Simply create a function that strips them from a string, and do this:

std::string remove_comments(std::string const& s)
{
  std::regex comment_re("#[^\n]*\n");
  return std::regex_replace(s, comment_re, "");
}
// above remove_comments not tested, but you get the idea

std::string operator""_verbose(const char* str, std::size_t length) {
  return remove_ws( remove_comments( {str, str+length} ) );
}

Once finished, we get:

charref = re.compile(R"---(
 &[#]                # Start of a numeric entity reference
 (
     0[0-7]+         # Octal form
   | [0-9]+          # Decimal form
   | x[0-9a-fA-F]+   # Hexadecimal form
 )
 ;                   # Trailing semicolon
)---"_verbose);

and done.

Yakk - Adam Nevraumont
  • 262,606
  • 27
  • 330
  • 524