Highest Voted 'cjk' Questions

23

votes

9 answers

How to do a Python split() on languages (like Chinese) that don't use whitespace as word separator?

I want to split a sentence into a list of words. For English and European languages this is easy, just use split() >>> "This is a sentence.".split() ['This', 'is', 'a', 'sentence.'] But I also need to deal with sentences in languages such as…

asked Sep 26 '10 at 12:21

Continuation

12,722
20
82
106

22

votes

7 answers

Convert numbered pinyin to pinyin with tone marks

Are there any scripts, libraries, or programs using Python, or BASH tools (e.g. awk, perl, sed) which can correctly convert numbered pinyin (e.g. dian4 nao3) to UTF-8 pinyin with tone marks (e.g. diàn nǎo)? I have found the following examples, but…

python bash cjk

asked Nov 20 '11 at 08:32

Village

22,513
46
122
163

22

votes

2 answers

Are all Kanji characters in UTF-8 3 bytes long?

Can someone please confirm that all Kanji characters in Chinese are 3 bytes long in UTF-8?

unicode utf-8 character-encoding cjk

asked Sep 09 '10 at 16:55

TopCoder

4,206
19
52
64

22

votes

5 answers

how do I add a font in gVim on windows system

I wanted to add a UTF-8 font in Gvim but I could not find out how to do this. I tried to follow the step on this manual but it still did not work. http://www.inter-locale.com/whitepaper/learn/learn_to_type.html (vim section halfway the page) Can…

vim unicode fonts cjk

asked Oct 25 '08 at 00:13

user18383

551
3
8
12

21

votes

9 answers

How does a file with Chinese characters know how many bytes to use per character?

I have read Joel's article "The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)" but still don't understand all the details. An example will illustrate my issues. Look at this…

unicode encoding cjk

asked Apr 22 '09 at 01:40

Petras

4,686
14
57
89

21

votes

7 answers

How to convert Chinese characters to Pinyin

For sorting Chinese language text, I want to convert Chinese characters to Pinyin, properly separating each Chinese character and grouping successive characters together. Can you please help me in this task by providing the logic or source code for…

sorting cjk

asked Jan 27 '11 at 05:35

Ashish Yadav

211
1
2
3

20

votes

3 answers

Encoding mail subject (SMTP) in Python with non-ASCII characters

I am using Python module MimeWriter to construct a message and smtplib to send a mail constructed message is: file msg.txt: ----------------------- Content-Type: multipart/mixed; from: me to: me@abc.com subject: 主題 Content-Type:…

python utf-8 character-encoding smtp cjk

asked Aug 02 '11 at 13:53

Rakesh

271
1
2
11

20

votes

2 answers

Regular Expression for Japanese characters

I am doing internationalization in Struts. I want to write Javascript validation for Japanese and English users. I know regular expression for English but not for Japanese users. Is it possible to write one regular expression for both the users…

javascript regex unicode internationalization cjk

asked Jul 22 '11 at 08:54

Nilesh Shukla

309
2
5
12

19

votes

12 answers

What is the fastest way to the delete lines in a file which have no match in a second file?

I have two files, wordlist.txt and text.txt. The first file, wordlist.txt, contains a huge list of words in Chinese, Japanese, and Korean, e.g.: 你你们我 The second file, text.txt, contains long passages, e.g.: 你们要去哪里？卡拉OK好不好？ I want to create a…

ruby perl bash python-2.7 cjk

asked Mar 20 '12 at 02:01

Village

22,513
46
122
163

19

votes

4 answers

Prevent/workaround browser converting '\n' between lines into space (for Chinese characters)

Converting newline into space makes sense for English, for example, the following HTML:

This is a sentence.

We get the following after converting the newline into space in the browser: This is a sentence. This is good for English, but not…

html browser cjk

asked Dec 18 '11 at 06:03

cyfdecyf

816
2
10
20

19

votes

6 answers

Conversion from Simplified to Traditional Chinese

If a website is localized/internationalized with a Simplified Chinese translation... Is it possible to reliably automatically convert the text to Traditional Chinese in a high quality way? If so, is it going to be extremely high quality or just a…

php localization internationalization cjk

asked May 13 '11 at 23:05

philfreo

41,941
26
128
141

19

votes

6 answers

how to print chinese word in my code.. using python

This is my code: print '哈哈'.decode('gb2312').encode('utf-8') ...and it prints: SyntaxError: Non-ASCII character '\xe5' in file D:\zjm_code\a.py on line 2, but no encoding declared; see http://www.python.org/peps/pep-0263.html for details How do I…

python cjk

asked Apr 22 '10 at 03:18

zjm1126

63,397
81
173
221

18

votes

4 answers

How can I detect certain Unicode characters in a string in Ruby?

Given a string in Ruby 1.8.7 (without the awesome Oniguruma regular expression engine that supports Unicode properties with \p{}), I would like to be able to determine if the string contains one or more Chinese, Japanese, or Korean characters;…

ruby unicode encoding character-encoding cjk

asked Jan 13 '11 at 14:22

Josh Glover

25,142
27
92
129

18

votes

2 answers

Word break in languages without spaces between words (e.g., Asian)?

I'd like to make MySQL full text search work with Japanese and Chinese text, as well as any other language. The problem is that these languages and probably others do not normally have white space between words. Search is not useful when you must…

php full-text-search tokenize cjk wordbreaker

asked Oct 22 '09 at 06:26

Joe Langeway

300
2
8

18

votes

1 answer

Drawing multilingual text using PIL

I'm having trouble drawing multilingual text using PIL. Let's say I want to draw text - "ひらがな - Hiragana, 히라가나". But PIL's ImageDraw.text() function takes only one font at a time, so I cannot draw this text correctly, because it requires English,…

unicode fonts python-imaging-library cjk imaging

asked Jul 10 '12 at 10:51

redism

500
7
18

Questions tagged [cjk]