We are trying to match the German string.
Munich tausendschöne Jungfräulein ausendschçne
We are able to match it with a PCRE regex which uses positive lookahead and a sequence of multiple UTF-8 codepoints.
For example, (?=.+(\x{0068}\x{00F6})){1}
.
However, when we add any of the UTF-8 literals, ö
, ä
, ç
into the PCRE regex, pcre_compile()
complains about invalid UTF-8 regex string.
using a C/C++ PCRE regex with PCRE_UTF8
, PCRE_UCP
, PCRE_CASELESS
options activated which uses the UTF-8 literals, ö
, ä
, ç
, What might be a valid PCRE regex which uses the UTF-8 literals ö
or ä
or ç
?