Questions tagged [gb2312]

GB 2312 (now GB/T 2312-1980) is a character set for Chinese characters, normally encoded as EUC-CN. Officially superseded by GBK and GB18030 (which include additional characters), it remains in widespread use.

53 questions
1
vote
1 answer

Is there any way to do convert between utf-8 and plain string platform-independent?

Here the plain string has a kind of encoding which: A plain string-literal such as "plainstring" encoded as; All standard libraries return or accept. For example: std::cout << "I'm ok." ; // plain string, ok on my system, …
ChungkingExpress
  • 323
  • 1
  • 12
1
vote
0 answers

Python(2.7) splinter find element on a gb2312 webpage

I am trying to use splinter in python to interact with a web (http://bbs.nju.edu.cn/) which is encoded in gb2312. But I am facing problems finding elements. I am using notepad++ in UTF-8 w/o BOM edcoding. I have done a lot research and practiced to…
1
vote
0 answers

Encoding error in dynamically created XML files due to GB2312 encoding?

I generate XML files with GB2312 encoding by MySQL and PHP, because of chinese characters on my site. It worked fine until I moved my website from a shared webhosting to an Ubuntu 16.04 VPS and installed Plesk. Now all my XML files throw an encoding…
Hunnenkoenig
  • 193
  • 1
  • 2
  • 14
1
vote
2 answers

How to find whether a character is GB2312 in java

I would like write a java function like: if one char is not in GB2312, return false Boolean isGB2312(String chinese) { ...... }
William_He
  • 357
  • 3
  • 4
  • 11
1
vote
2 answers

u'囧'.encode('gb2312') throws UnicodeEncodeError

Firefox can display '囧' in gb2312 encoded HTML. But u'囧'.encode('gb2312') throws UnicodeEncodeError. 1.Is there a map, so firefox can lookup gb2312 encoded characters in that map, find 01 display matrix and display 囧. 2.Is there a map for tranlating…
user3822769
  • 151
  • 6
1
vote
1 answer

Python: gb2312 codec can't decode bytes

I have a word-encoded string from received mail. When parsing encoded word in Python3, I got an exception 'gb2312' codec can't decode bytes in position 18-19: illegal multibyte sequence raised from make_header method. from email.header import…
Patrik Polakovic
  • 542
  • 1
  • 8
  • 20
1
vote
2 answers

Converting utf8 to gb2312 in java

Just look at the code bellow try { String str = "上海上海"; String gb2312 = new String(str.getBytes("utf-8"), "gb2312"); String utf8 = new String(gb2312.getBytes("gb2312"), "utf-8"); System.out.println(str.equals(utf8)); …
查晓明
  • 13
  • 1
  • 3
1
vote
3 answers

Read GB2312 encoding page using Ruby

I am trying to parse GB2312 encoded page (http://news.qq.com/a/20140824/015032.htm), and this is my code. I am not yet into the parsing part, just in the open and read, and I got error. This is my code: require…
VHanded
  • 2,079
  • 4
  • 30
  • 55
1
vote
1 answer

How to convert UTF-8 interpreted GB2312 encoding to real UTF-8 encoding?

This is a strange scenario, not conventional converting one encoding to another one. Question I use Readability API to retrieve main content from given url, it works fine if the target url is encoded with UTF-8, but when target url is encoded in…
jasonslyvia
  • 2,529
  • 1
  • 24
  • 33
1
vote
0 answers

ruby spidr can't spider "gb2312" coding html page

Why ruby spidr can't spider "gb2312" coding html page? Can it only spider "utf-8" coding page? This is my code: Spidr.site('http://www.lookmw.cn/') do |spider| spider.every_page do |page| puts "[-] #{page.url}" end end It answers:…
randy ling
  • 62
  • 8
1
vote
1 answer

How to convert GB2312 (chinese) characters in UTF-8 inside Weblogic 12?

We have pages that´re using simplified Chinese (GB2312) in the HTML form. When we submit the form with 3 Chinese characters in a text field, we receive 6 others characters (that aren't in Chinese) in the server (Weblogic 12). Then we save these 6…
1
vote
1 answer

character encoding issue - GB2312

I am displaying simplified chines character retrieved from database using the below code snippet but it is displaying junk character String text="×°ÏäʱÇëÅÄÕÕ"; // retrieved from database String result=new…
Arun
  • 1,167
  • 18
  • 37
1
vote
2 answers

PHP - UTF-8 to Chinese ANSI (GB2312?) - Export CSV file

I post this after several hours of research (several times...). I couldn't find any answer yet. My goal is to write a CSV file using PHP. This file has to have the Chinese ANSI encoding (I suppose it's GB2312 for simplified Chinese, in notepad++ I…
david_b
  • 11
  • 1
  • 3
0
votes
2 answers

How to parse RSS with GB2312 encoding in Python

I have a RSS feed shich is encoded in GB2312 When I am trying to parse it using following code: for item in XML.ElementFromURL(feed).xpath('//item'): title = item.find('title').text It is not able to parse the Feed. Any Idea how to parse…
Simsons
  • 12,295
  • 42
  • 153
  • 269
0
votes
1 answer

Adding support for gb2312 and sift-jis to newlib iconv

I have a requirement to covert UCS2 to following code pages Chinese: gb2312 Japanese: shift_jis Russian : cp1251 Hungrian, Polish and Cesky: cp1252 Default:cp1250 I could see that items 3-5 are supported in newlib iconv…
Anas
  • 13
  • 2