1

Basically, what I want, is to understand why

select 'aa' regexp '[h]' returns 0 and

select 'აა' regexp '[ჰ]' returns 1 ?

check FIDDLE

bumbeishvili
  • 1,342
  • 14
  • 27
  • Just an assumption. Without character class it's working fine. Maybe because in this case the byte-sequence is matched. First bytes of [ა](http://www.fileformat.info/info/unicode/char/10d0/index.htm) and [ჰ](http://www.fileformat.info/info/unicode/char/10f0/index.htm) are identical: `E183` (hex). So those in the class would match. Also tried using utf8_bin without success. Weird :D – bobble bubble Nov 07 '15 at 11:46

1 Answers1

1

I think MqSQL regex does not support utf-8 yet. See bug 30241 and 12.5.2 Regular Expressions.

Warning

The REGEXP and RLIKE operators work in byte-wise fashion, so they are not multibyte safe and may produce unexpected results with multibyte character sets. In addition, these operators compare characters by their byte values and accented characters may not compare as equal even if a given collation treats them as equal.

You could match the byte sequence without character class: SELECT 'აა' REGEXP 'ჰ' returns 0.

Community
  • 1
  • 1
bobble bubble
  • 16,888
  • 3
  • 27
  • 46