Is there anyway to get libc6
's regexp functions regcomp
and regexec
to work properly with multi-byte characters?
For instance, if my pattern is the utf8 characters 猫机+猫
, finding a match on the utf8 encoded string 猫机机机猫
will fail, where it should succeed.
I think this is because the character 机
's byte representation is \xe6\x9c\xba
, and the +
is matching one or more of the byte \xba
. I can make this instance work by putting parenthesis around each multibyte character in the pattern, but since this is for an application I can't require users to do this.
Is there a way to flag a pattern or string to match as containing utf8 characters? Perhaps telling libc
to store the pattern as wchar instead of char?