0

I want to return entries from a table that match the format:

prefix + optional spaces + Thai digit

Testing using ยก as the prefix I use the following SQL

SELECT term
FROM entries
WHERE term REGEXP "^ยก[\s]*[๐-๙]+$"

This returns 9 entries, 4 of which don't have the correct prefix, and none of them ends in a digit.

ยกนะ
ยกบัตร
ยกมือ
ยกยอ
ยกยอด
ยกหยิบ
ยมทูต
ยมนา
ยมบาล
ยมล

It doesn't return

ยก ๑
ยก ๒

which I know are in the database and are the entries I want.

I'm very new to all this. What am I doing wrong?

FWIW, this is against a MySQL database and everything is in Unicode.

Thanks

Oswulf
  • 41
  • 3

2 Answers2

0

As quoted from the MySQL docs:

The REGEXP and RLIKE operators work in byte-wise fashion, so they are not multi-byte safe and may produce unexpected results with multi-byte character sets. In addition, these operators compare characters by their byte values and accented characters may not compare as equal even if a given collation treats them as equal.

Doesn't seem like MySQL's REGEXP can handle the [๐-๙] range correctly due to the above.

Sam
  • 20,096
  • 2
  • 45
  • 71
  • That certainly is the problem then. Thanks. I find it amazing that in this day and age MySQL can't handle Unicode. – Oswulf May 31 '14 at 00:02
0

I use utf8_general_ci and try.I matched

ยกนะ 

with "^ยก[\s]*[๐-๙]+$" but did't matched ยก ๑.So I change the regexp to

"^ยก[ ]*[๐-๙]+$"

,and it can match

ยกนะ 
ยก ๑

Maybe the problem is character encoding.