1

Firefox can display '囧' in gb2312 encoded HTML. But u'囧'.encode('gb2312') throws UnicodeEncodeError.

1.Is there a map, so firefox can lookup gb2312 encoded characters in that map, find 01 display matrix and display .

2.Is there a map for tranlating unicode to gb2312 but u'囧' is not in that map?

vabada
  • 1,738
  • 4
  • 29
  • 37
user3822769
  • 151
  • 6

2 Answers2

3

囧 not in gb2312, use gb18030 instead. I guess firefox may extends encode method when she face unknown characters.

3

When people or software says that something is GB2312 encoded, they most often mean that it is encoded in the GBK encoding, a.k.a. CP936 from Microsoft. GB2312 was a subset of GBK that was used in the 1980ies, but both are part of the same family of encodings.

Incidentally the forthcoming WhatWG's encoding specification recommends to treat any text labelled as "gb2312" as GBK encoded text.

Therefore, try u'囧'.encode('gbk') or u'囧'.encode('cp936') or u'囧'.encode('windows-936').

Bruno Haible
  • 1,203
  • 8
  • 8
  • More precisely, GBK is not even the same as cp936, notably the euro sign is in cp936 but not in GBK, and 95 more characters are in GBK but not in cp936. – Chenfeng Oct 31 '18 at 23:32