Questions tagged [cjk]

CJK stands for Chinese, Japanese and Korean and is used to label issues common to these East Asian languages and their large character repertoires.

CJK stands for Chinese, Japanese, and Korean: East-Asian languages covered by various character sets, including:

  • Big5
  • EUC-JP
  • EUC-KR
  • Shift-JIS
  • GB2312
  • GB18030
  • ISO 2022-JP
  • Unicode
1096 questions
10
votes
3 answers

Can I get Console to show Chinese?

I've always wondered if it would be possible to show UTF8 or UTF16-Chinese text in a Console window, e.g., Console.WriteLine(chinese). For the time being, it shows up as ???. Is it possible to kick up a Console session that supports Chinese…
tofutim
  • 22,664
  • 20
  • 87
  • 148
10
votes
2 answers

Any tools to programmatically convert Japanese sentence into its romaji (phonetical reading)?

Input: 日本が好きです. Output: Nippon ga sukidesu. Phonetical reading is unfortunately not available through Google Translate API.
Arman
  • 1,074
  • 3
  • 20
  • 40
10
votes
2 answers

C# Regex.Split is working differently than JavaScript

I'm trying to convert this long JS regex to C#. The JS code below gives 29 items in an array starting from ["","常","","に","","最新","、","最高"...] var keywords =…
Youngjae
  • 24,352
  • 18
  • 113
  • 198
10
votes
1 answer

Enter keyup event in japanese input

I've an input field on which I'm listening to keyup events. Using the Japanese input method I start typing characters and the event doesn't get triggered; which is expected as the enter characters are being converted to hiragana and a drop down…
Javier Mr
  • 2,130
  • 4
  • 31
  • 39
10
votes
4 answers

Adjust the vertical positioning of ruby text

I'd like to use HTML to mark up Japanese text with its pronunciation. However, I've found that at large font sizes, the baseline of the text is well above the top of the characters it's marking up. Here's an example which shows what I…
Rose Kunkel
  • 3,102
  • 2
  • 27
  • 53
10
votes
2 answers

How to determine if a character is a Chinese character

How to determine if a character is a Chinese character using ruby?
HelloWorld
  • 7,156
  • 6
  • 39
  • 36
10
votes
2 answers

Regex for Matching Pinyin

I'm looking for a regular expression that can correctly match valid pinyin (e.g. "sheng", "sou" (while ignoring invalid pinyin, e.g. "shong", "sei"). Most of the regex provided in the top Google results match invalid pinyin in some cases. Obviously,…
stevendaniels
  • 2,992
  • 1
  • 27
  • 31
9
votes
4 answers

How to parse UTF-8 characters in Excel files using POI

I have been using POI to parse XLS and XLSX files successfully. However, I am unable to correctly extract special characters, such as UTF-8 encoded characters like Chinese or Japanese, from an Excel spreadsheet. I have figured out how to extract…
user1198370
  • 91
  • 1
  • 1
  • 2
9
votes
3 answers

Chinese unicode fonts in PyGame

How can I display Chinese characters in PyGame? And what's a good free/libre font to use for this purpose?
Bopete
9
votes
2 answers

convert unicode into character with ruby

I found a dictionary of Chinese characters in unicode. I'm trying to build a database of Characters out of this dictionary but I don't know how to convert unicode to a character.. p "国".unpack("U*").first #this gives the unicode 22269 How can…
thenengah
  • 42,557
  • 33
  • 113
  • 157
9
votes
1 answer

Differentiating CJK languages (Chinese, Japanese, Korean) in Android

I want to be able to recognize Chinese, Japanese, and Korean written characters, both as a general group and as subdivided languages. These are the reasons: Recognize CJK as a general group: I am making a vertical script Mongolian TextView. To do…
Suragch
  • 484,302
  • 314
  • 1,365
  • 1,393
9
votes
2 answers

How to match chinese characters with grep?

It is verified that [\u4e00-\u9fff] can match chinese characters in vim. :%g/[\u4e00-\u9fff]/d The command above can delete all the lines containing chinese characters. ls /tmp/test ktop 1_001.png.bak fonts.dir.bak New Screenshot from 2016-09-12…
showkey
  • 482
  • 42
  • 140
  • 295
9
votes
3 answers

Is there something better than the kakasi library for gojûon collation?

"Better" primarily means accuracy, but I am also interested in any other criteria in which other systems excel. I sampled the Perl binding Text::Kakasi for correctness in an admittedly limited fashion and it works just fine for our needs. use…
daxim
  • 39,270
  • 4
  • 65
  • 132
9
votes
1 answer

Why is my QML CJK text being rendered with corrupt glyphs?

My application allows the user to switch languages on the fly. I'm seeing that about 10% of the time the user switches to Chinese or Japanese, the glyphs for the UI text are being rendered improperly. This application is running under Linux on an…
Matthew Reynolds
  • 151
  • 1
  • 10
9
votes
2 answers

Detect chinese (multibyte) character in the string

$str = "This is a string containing 中文 characters. Some more characters - 中华人民共和国 "; How do I detect chinese characters from this string and print the part which starts with the first character and ends with "-"? (it would be "中文 characters. Some…
Josh