Highest Voted 'cjk' Questions

7

votes

3 answers

How do I format Chinese characters so they fit the columns?

I am trying to print some information in a column-oriented way. Everything works well for Latin characters, but when Chinese characters are printed, the columns stop being aligned. Let's consider an example: var latinPresentation1 = "some…

c# string formatting cjk

asked Jan 16 '19 at 11:26

artsch

225
2
10

7

votes

2 answers

Korean, Thai and Indonesian POS tagger

Can someone recommend an open source POS tagger for Korean, Indonesian, Thai and Vietnamese? That I can use to tag the corpus data that I currently have. (e.g. the stanford-postagger) If you are a dev and care to share and let me test out the POS…

nlp nltk cjk pos-tagger thai

asked Mar 12 '11 at 04:31

alvas

115,346
109
446
738

7

votes

2 answers

Word wrap algorithms for Japanese

In a recent web application I built, I was pleasantly surprised when one of our users decided to use it to create something entirely in Japanese. However, the text was wrapped strangely and awkwardly. Apparently browsers don't cope with wrapping…

algorithm unicode internationalization cjk word-wrap

asked Jan 19 '10 at 00:45

Breton

15,401
3
59
76

7

votes

7 answers

Japanese ASCII Code

Where can I get a list of ASCII codes corresponding to Japanese kanji, hiragana and katakana characters. I am doing a java function and Javascript which determines wether it is a Japanese character. What is its range in the ASCII code?

unicode cjk

asked Nov 26 '09 at 04:02

cedric

3,107
15
54
65

7

votes

2 answers

Understanding Python Unicode and Linux terminal

I have a Python script that writes some strings with UTF-8 encoding. In my script I am using mainly the str() function to cast to string. It looks like that: mystring="this is unicode string:"+japanesevalues[1] #japanesevalues is a list of unicode…

python linux unicode cjk

asked Jul 02 '13 at 06:47

Cesc

648
1
11
22

7

votes

3 answers

Detecting CJK characters in a string (C#)

I am using iTextSharp to generate a series of PDFs, using Open Sans as the default font. On occasion, names are inserted into the content of the PDFs. However my issue is that some of the names I need to insert contain CJK characters (stored in…

c# .net regex itext cjk

asked May 07 '13 at 08:58

user1961026

7

votes

2 answers

Manipulating utf8mb4 data from MySQL with PHP

This is probably something simple. I swear I've been looking online for the answer and haven't found it. Since my particular case is a little atypical I finally decided to ask here. I have a few tables in MySQL that I'm using for a Chinese language…

php mysql cjk utf8mb4

asked Oct 23 '12 at 10:36

Yhilan

269
1
3
15

7

votes

3 answers

Get the number of bytes needed for a Unicode string

I have a Korean string encoded as Unicode like u'정정'. How do I know how many bytes are needed to represent this string? I need to know the exact byte count since I'm using the string for iOS push notification and it has a limit on the size of the…

python string unicode cjk

asked Aug 06 '12 at 17:11

jasondinh

918
7
21

7

votes

2 answers

How to get the length of Japanese characters in Javascript?

I have an ASP Classic page with SHIFT_JIS charset. The meta tag under the page's head section is like this: My page has a text box (txtName) that should only allow 200…

javascript unicode asp-classic cjk shift-jis

asked Jul 12 '12 at 14:22

mark uy

521
1
6
17

7

votes

3 answers

n-gram name analysis in non-english languages (CJK, etc)

I'm working on deduping a database of people. For a first pass, I'm following a basic 2-step process to avoid an O(n^2) operation over the whole database, as described in the literature. First, I "block"- iterate over the whole dataset, and bin each…

python nlp similarity n-gram cjk

asked Apr 05 '12 at 19:34

Matt Luongo

14,371
6
53
64

6

votes

2 answers

Allowing Simplified Chinese Input

The company I work for is bidding on a project that will require our eCommerce solution to accept simplified Chinese input. After doing a bit of research, it seems that ASP.net makes globalization configuration easy: …

c# asp.net sql-server-2005 globalization cjk

asked Mar 23 '12 at 18:48

James Hill

60,353
20
145
161

6

votes

1 answer

Perl regex find character from arbitrary set

I have a file with Korean and chinese characters. I want to find pairs where parenthetical statements are used to give the hanja for a Korean word, like this: 한문 (漢文) The search would look something like this: /[korean characters] \([chinese…

regex perl cjk

asked Jan 24 '12 at 00:00

Nate Glenn

6,455
8
52
95

6

votes

1 answer

Faker Python generating chinese/pinyin names

I am trying to generate random chinese names using Faker (Python), but it generates the names in chinese characters instead of pinyin. I found this : and it show that it generates them in pinyin, while when I try the same code, it gives me only…

python cjk faker

asked Aug 02 '22 at 11:30

Armonia

77
5

6

votes

2 answers

Converting zenkaku characters to hankaku and vice-versa in C#

As it says in the header line, I want to convert zenkaku characters to hankaku ones and vice-vrsa in C#, but can't figure out how to do it. So, say "ラーメン" to "ﾗｰﾒﾝ" and the other way around. Would it be possible to write this in a method which…

c# string format cjk

asked Jun 22 '11 at 02:51

yu_ominae

2,975
6
39
76

6

votes

2 answers

How to make beautiful line breaks in Japanese?

I have a website in English and Japanese. English is displayed perfectly. There are problems with hyphenation in Japanese. Sometimes hanging 1-2 characters remain on a new line. I want to manage the hyphenation and put it where I need to. I split…

html css cjk

asked Aug 08 '19 at 09:17

VoidArray

185
1
11

Questions tagged [cjk]