0

For some reason, on some browsers, a CP-1252 ellipsis (0x85) is showing up as ů. I believe the server is claiming the page will be in UTF-8 (don't ask me why a UTF-8 server is serving CP-1252, that is out of scope). I would understand throwing a warning because it isn't valid UTF-8. I would understand it showing up as the Latin1 character U+0085 NEXT LINE (NEL). But I can't for the life of me figure out why it displays as U+016F LATIN SMALL LETTER U WITH RING ABOVE.

This is what I am seeing:

enter image description here

And here is a hexdump -C of the file

00000000  78 78 78 78 78 78 78 78  78 78 78 78 78 78 78 78  |xxxxxxxxxxxxxxxx|
*
00000030  78 85 3c 2f 69 3e 0d 0a                           |x.</i>..|
00000038
Chas. Owens
  • 64,182
  • 22
  • 135
  • 226
  • The only way this should happen is if the browser is ignoring the page's reported charset and using a different charset instead, such as a user-specified override. But I don't know which charset(s) would interpret 0x85 as U+016F. None of the CP-12xx/Windows-12xx charsets do, 0x85 is U+2026 HORIZONTAL ELLIPSIS. And none of the ISO-8859-x even support 0x85 at all. – Remy Lebeau May 30 '16 at 02:18
  • I found a charset that interprets 0x85 as U+016F: [CP852](http://www.kreativekorp.com/charset/encoding.php?name=CP852) (DOS Latin-2), not to be confused with [ISO-8859-2](https://en.m.wikipedia.org/wiki/ISO/IEC_8859-2) (ISO Latin-2). – Remy Lebeau May 30 '16 at 02:25
  • Thanks @RemyLebeau, it seems odd that a normally configured browser is treating some text as DOS Latin-2, but at least that makes more sense than the "it's magic" I was coming up with. I will do some more tests to see if I can duplicate with different characters. – Chas. Owens May 30 '16 at 23:09

1 Answers1

1

Flagrant mojibake case. Once upon time I have written a small .bat script that shows mappings of (most known) OEM and ANSI code pages to Unicode table and vice versa. Here's a particular result for 0x85 code:

==> alts.bat 0x85
CP/ACP  Hex  Codepoint  #Description   :show8bit 133 <--> 0x85)
------  ---  ---------  ------------------------
CP1250  0x85    0x2026  #HORIZONTAL ELLIPSIS
CP1251  0x85    0x2026  #HORIZONTAL ELLIPSIS
CP1252  0x85    0x2026  #HORIZONTAL ELLIPSIS
CP1253  0x85    0x2026  #HORIZONTAL ELLIPSIS
CP1254  0x85    0x2026  #HORIZONTAL ELLIPSIS
CP1255  0x85    0x2026  #HORIZONTAL ELLIPSIS
CP1256  0x85    0x2026  #HORIZONTAL ELLIPSIS
CP1257  0x85    0x2026  #HORIZONTAL ELLIPSIS
CP1258  0x85    0x2026  #HORIZONTAL ELLIPSIS
CP437   0x85    0x00e0  #LATIN SMALL LETTER A WITH GRAVE
CP737   0x85    0x0396  #GREEK CAPITAL LETTER ZETA
CP775   0x85    0x0123  #LATIN SMALL LETTER G WITH CEDILLA
CP850   0x85    0x00e0  #LATIN SMALL LETTER A WITH GRAVE
CP852   0x85    0x016f  #LATIN SMALL LETTER U WITH RING ABOVE
CP855   0x85    0x0401  #CYRILLIC CAPITAL LETTER IO
CP857   0x85    0x00e0  #LATIN SMALL LETTER A WITH GRAVE
CP860   0x85    0x00e0  #LATIN SMALL LETTER A WITH GRAVE
CP861   0x85    0x00e0  #LATIN SMALL LETTER A WITH GRAVE
CP862   0x85    0x05d5  #HEBREW LETTER VAV
CP863   0x85    0x00e0  #LATIN SMALL LETTER A WITH GRAVE
CP864   0x85    0x2500  #FORMS LIGHT HORIZONTAL
CP865   0x85    0x00e0  #LATIN SMALL LETTER A WITH GRAVE
CP866   0x85    0x0415  #CYRILLIC CAPITAL LETTER IE
CP869   0x85            #UNDEFINED
CP874   0x85    0x2026  #HORIZONTAL ELLIPSIS
CP932   0x85            #DBCS LEAD BYTE
CP936   0x85            #DBCS LEAD BYTE
CP949   0x85            #DBCS LEAD BYTE
CP950   0x85            #DBCS LEAD BYTE

==>

and vice versa for 0x2026 codepoint (sorry for bad output columns shift in case of non-windows CP lines):

==> alts.bat 0x2026
CP/ACP  Hex  Codepoint  #Description   :show16bit 8230 <--> 0x2026
------  ---  ---------  -------------------------
CP1250  0x85    0x2026  #HORIZONTAL ELLIPSIS
CP1251  0x85    0x2026  #HORIZONTAL ELLIPSIS
CP1252  0x85    0x2026  #HORIZONTAL ELLIPSIS
CP1253  0x85    0x2026  #HORIZONTAL ELLIPSIS
CP1254  0x85    0x2026  #HORIZONTAL ELLIPSIS
CP1255  0x85    0x2026  #HORIZONTAL ELLIPSIS
CP1256  0x85    0x2026  #HORIZONTAL ELLIPSIS
CP1257  0x85    0x2026  #HORIZONTAL ELLIPSIS
CP1258  0x85    0x2026  #HORIZONTAL ELLIPSIS
CP874   0x85    0x2026  #HORIZONTAL ELLIPSIS
CP932   0x8163  0x2026  #HORIZONTAL ELLIPSIS
CP936   0xA1AD  0x2026  #HORIZONTAL ELLIPSIS
CP949   0xA1A6  0x2026  #HORIZONTAL ELLIPSIS
CP950   0xA14B  0x2026  #HORIZONTAL ELLIPSIS
macCYRILLIC_CP  0xC9    0x2026  #HORIZONTAL ELLIPSIS
macGREEK_CP     0xC9    0x2026  #HORIZONTAL ELLIPSIS
macICELAND_CP   0xC9    0x2026  #HORIZONTAL ELLIPSIS
macLATIN2_CP    0xC9    0x2026  #HORIZONTAL ELLIPSIS
macROMAN_CP     0xC9    0x2026  #HORIZONTAL ELLIPSIS
macTURKISH_CP   0xC9    0x2026  #HORIZONTAL ELLIPSIS

==>

Further reading: Encodings and Code Pages

JosefZ
  • 28,460
  • 5
  • 44
  • 83