1

I've heard there are (used to be?) ambiguous mappings between Unicode and SHIFT_JIS codes. This KB article somewhat proves this.

So the question is: will I lose any data if I take SHIFT_JIS-encoded text, convert it to Unicode and back?

Details: I'm talking about Windows (XP and on) and .NET (which in theory relies on NLS API).

DreamSonic
  • 1,454
  • 11
  • 19

1 Answers1

1

Yes, it looks like this will still lose data:

using System;
using System.Text;

class Test
{
    static void Main(string[] args)
    {
        Encoding shiftJis = Encoding.GetEncoding(932);        
        byte[] original = new byte[] { 0x87, 0x90 };        
        string text = shiftJis.GetString(original);
        byte[] backAgain = shiftJis.GetBytes(text);     
        Console.WriteLine("{0:x}{1:x}", backAgain[0], backAgain[1]);
    }
}

This prints 81E0, as predicted by the page you linked to.

Jon Skeet
  • 1,421,763
  • 867
  • 9,128
  • 9,194