2

I am using the code below to try and match symbols using regex, (as an example, I am trying to match the circle star symbol, http://graphemica.com/%E2%9C%AA)

#include <regex>
#include <iostream>

int main() {
  std::wsmatch matches;
  std::wstring x = L"✪";
  //  std::wregex e(L"(\\pS)+");
  std::wregex e(L"([[:S:]]+)");
  if (std::regex_match(x, matches, e))
  {
    // never reached
    std::cout << "Never reached";
  } 

  std::cout << "Bye.";

  return 0;
}

The symbol ✪ (0x272A) is not matched, I also tried with other symbols and none of them work, (© for example).

I tried [:S:], \pS and \p{S}, none of them work, (the last one throws an exception)

This is a similar, (but not the same namespace), problem as the one as with the boost library, (Common symbols '\p{S}' not been 'matched' using boost wregex)

Community
  • 1
  • 1
FFMG
  • 1,208
  • 1
  • 10
  • 24

1 Answers1

1

Neither ECMAScript 3rd ed. nor POSIX regex grammars support Unicode category character classes. You can form them yourself using \u and \U-based character ranges, but hoping for things like \p{So} is a lost cause with present specifications.

As I answered on your other question, if you really want to use them, Boost.Regex supports them via the boost::u32regex if built with ICU support enabled. (PCRE/PCRE2 support them as well, but as with most C libraries, I hesitate to recommend these for new C++ code.)

Community
  • 1
  • 1
ildjarn
  • 62,044
  • 9
  • 127
  • 211
  • Thanks for the answer, it makes sense now why neither work. I am curious about the last part of your answer, I have used pcre2 in the past, (and I like it), so I wonder why you would not recommend it. – FFMG Jul 23 '16 at 15:21
  • 1
    @FFMG : Because this is tagged `c++` and `c++11`, not `c`. Personally, if I asked a C++ question and got a C answer/recommendation I'd be rather annoyed. ;-] – ildjarn Jul 23 '16 at 15:23