Questions tagged [cjk]

CJK stands for Chinese, Japanese and Korean and is used to label issues common to these East Asian languages and their large character repertoires.

CJK stands for Chinese, Japanese, and Korean: East-Asian languages covered by various character sets, including:

  • Big5
  • EUC-JP
  • EUC-KR
  • Shift-JIS
  • GB2312
  • GB18030
  • ISO 2022-JP
  • Unicode
1096 questions
13
votes
5 answers

UTF-8 file output in R

I'm using R 2.15.0 on Windows 7 64-bit. I would like to output unicode (CJK) text to a file. The following code shows how a Unicode character sent to write on a UTF-8 file connection does not work as (I) expected: rty <-…
Patrick
  • 187
  • 1
  • 2
  • 7
12
votes
4 answers

UTF-8 CJK characters not displaying in Java

I've been reading up on Unicode and UTF-8 encoding for a while and I think I understand it, so hopefully this won't be a stupid question: I have a file which contains some CJK characters, and which has been saved as UTF-8. I have various Asian…
Twicetimes
  • 678
  • 1
  • 7
  • 15
12
votes
3 answers

What is a ttf font I can freely use that covers Chinese, Japanese and Korean for Java

On a blackbox linux system neither the system or the openjdk had any fonts so this caused issue for my Java application. So far to get round this I have copied the Lucida fonts from an Oracle Java install into the jre/lib/fonts dir and ran fc-cache…
Paul Taylor
  • 13,411
  • 42
  • 184
  • 351
12
votes
2 answers

What are all the Japanese whitespace characters?

I need to split a string and extract words separated by whitespace characters.The source may be in English or Japanese. English whitespace characters include tab and space, and Japanese text uses these too. (IIRC, all widely-used Japanese character…
Mason
  • 5,071
  • 4
  • 25
  • 24
12
votes
5 answers

Python: any way to perform this "hybrid" split() on multi-lingual (e.g. Chinese & English) strings?

I have strings that are multi-lingual consist of both languages that use whitespace as word separator (English, French, etc) and languages that don't (Chinese, Japanese, Korean). Given such a string, I want to separate the English/French/etc part…
Continuation
  • 12,722
  • 20
  • 82
  • 106
12
votes
5 answers

Programmatically determine number of strokes in a Chinese character?

Does Unicode store stroke count information about Chinese, Japanese, or other stroke-based characters?
xkdkxdxc
  • 511
  • 5
  • 9
11
votes
5 answers

Django: How to add Chinese support to the application

I am trying to add a Chinese language to my application written in Django and I have a really hard time with that. I have spent half a day trying different approaches, no success. My application supports few languages, this is part of settings.py…
Pavulon
  • 111
  • 1
  • 1
  • 7
11
votes
2 answers

sort() for Japanese

If I have set my current locale to Japanese, how can I make it so that Japanese characters will always have higher preference than non-Japanese characters. For example, right now English characters will always appear before the Katakana characters.…
hao
  • 197
  • 6
11
votes
6 answers

Simplified Chinese Unicode table

Where can I find a Unicode table showing only the simplified Chinese characters? I have searched everywhere but cannot find anything. UPDATE : I have found that there is another encoding called GB 2312 - http://en.wikipedia.org/wiki/GB_2312 - which…
cmann
  • 1,920
  • 4
  • 21
  • 33
11
votes
3 answers

How do you sort CJK (Asian) characters in Perl, or with any other programming language?

How do you sort Chinese, Japanese and Korean (CJK) characters in Perl? As far as I can tell, sorting CJK characters by stroke count, then by radical, seems to be the way these languages are sorted. There are also some methods that sort by sounds,…
Neil
  • 24,551
  • 15
  • 60
  • 81
11
votes
1 answer

Making vertical Japanese text

Can anybody tell me the html/css to have Japanese text print from top to bottom, right to left (like in books) without changing the actual ilgnment of the characters? I am using UTF-16, If it helps.
William Edwards
  • 115
  • 1
  • 6
10
votes
2 answers

Android Application is not working in China

I developed an application for Taiwan which has Google map in it. Now it's on market but when Chinese people try to install it, it gives an error as follow: "The Application does not install successfully." It's working on all other places. Client…
Sandip Jadhav
  • 7,377
  • 8
  • 44
  • 76
10
votes
5 answers

How to save Chinese Characters to file with java?

I use the following code to save Chinese characters into a .txt file, but when I opened it with Wordpad, I couldn't read it. StringBuffer Shanghai_StrBuf = new StringBuffer("\u4E0A\u6D77"); boolean Append = true; FileOutputStream fos; fos = new…
Frank
  • 30,590
  • 58
  • 161
  • 244
10
votes
2 answers

How does tokenization and pattern matching work in Chinese.?

This question involves computing as well as knowledge of Chinese. I have chinese queries and I have a separate list of phrases in Chinese I need to be able to find which of these queries have any of these phrases. In english, it is a very simple…
xyz
  • 8,607
  • 16
  • 66
  • 90
10
votes
2 answers

Are chinese characters allowed entered in URLs?

Are chinese characters allowed to be entered in URLs? As tested, chinese characters are able to be entered in URLs, and it will convert to punycode as well and send out the request as well too, and reach to the related page. But for currently, is…
deepWebMie
  • 1,241
  • 2
  • 10
  • 13