3

I am currently writing a web app and will need to do some ordering on a set of Chinese characters and I want to know whether Chinese characters are sorted by databases, if so how does it get sorted?

For reference I will be using PostgreSQL.

Makoto
  • 104,088
  • 27
  • 192
  • 230
bash-
  • 6,144
  • 10
  • 43
  • 51

2 Answers2

1

PostgreSQL sorts text using the operating system locale facility. This is exactly the same behavior that operating system tools such as sort give you. So set your locale to something useful, such as zh_HK.utf8 when you initialize the database system.

If you don't like the results of that sort, you'll have to come with a custom solution.

Peter Eisentraut
  • 35,221
  • 12
  • 85
  • 90
0

The easiest and most common way to sort them is just as binary data, either as Unicode code points, or even more simple as raw binary data (which does work well for ASCII data). Unfortunately, that does not make for a very meaningful sort order. It does group things together though, so things like prefix queries should work.

For meaningful sort order, there is no good algorithmic solution. You'd need to work with lookup tables (see for example this thread about mapping Chinese to pinyin, by which you could then sort).

Community
  • 1
  • 1
Thilo
  • 257,207
  • 101
  • 511
  • 656
  • hmm that is a problem... I'm from Hong Kong and we do not have standardized phonetics for Cantonese and no one actually knows Mandarin pinyin :\ thanks for the direction though – bash- Sep 26 '11 at 11:36