4

My test server VM has been upgraded by corporate IT from Windows 2008 R2 to Windows 2016 Server (via 2012).

I've problems with running some of my tests now and tracked the issue down to character encoding issues.

The easiest thing to reproduce is this: When i open a xml document with Notepad++, it shows Encoded in UTF-8 in the Encodings menu. (The document comes from a TFS build checkout if it matters)

But the document contains French accentutated characters (e.g. "é") and these show up as two-character sequences ("é", byte sequence 0xC3,0xA9) on screen.

But on my dev PC as well as on the previous install, i would see the accentuated characters just fine!

The file on the disk is the same (encoded in UTF-8) - verified with a Hex editor - it contains the byte sequence 0xC3,0xA9.

But my new Windows machine somehow lacks the functionality to properly decode the UTF-8?

My test system also reads text files, and constructs disk paths from the contents. And is therefore affected by this issue. But i chose to report the issue seen in Notepad++ as it is independent from my test system and the cause is most likely the same.

I don't really know where to look at for this. Can somebody help?

Scrontch
  • 161
  • 4
  • How does it appear in Notepad? – Greg Askew Jul 16 '19 at 14:04
  • @Greg Askew: Notepad same as Notepad++: It shows "é" on my machine, two characters on the Win2016 nachine. – Scrontch Jul 16 '19 at 14:55
  • I would suspect this may be due to Visual Studio Source Explorer settings/TFS version control encodings. – Greg Askew Jul 16 '19 at 15:19
  • @Greg Askew: No, i verified with a hex editor: On both machines the files are binary-identical and "physically" UTF-8 (without BOM), i.e. the accentuated character is encoded as its two-byte UTF-8 pair. Its just shown as the right character on one and on the other as a two-character nonsense, for *any* application that shows text, as it seems I suspect it is a Windows code page issue from what i have read, but i dunno how to change that. I've set "Language for non-Unicode programs" (system locale) to the same (English US) on both machines, but still no luck. – Scrontch Jul 16 '19 at 15:29
  • _It shows "é" on my machine, two characters on the Win2016 nachine_. **Which** two characters, literally? E.g. `é` (byte sequence `0xC3`,`0xA9`)? – JosefZ Jul 16 '19 at 15:34
  • _My test system also reads text files, and constructs disk paths from the contents._ Please [edit] the question and share a relevant part of used code (which scripting/programming language?)… – JosefZ Jul 16 '19 at 15:43
  • @JosefZ. Yes, it shows é (byte sequence 0xC3,0xA9) - will edit question – Scrontch Jul 16 '19 at 15:45
  • 1
    So if you open notepad, paste in Ç, save it as a UTF-8 text file, re-open it, it has garbled characters? Also, the Language for non-Unicode programs should never be needed for anything. – Greg Askew Jul 16 '19 at 18:11
  • Ok, sorry, found my issue. It was a Powershell script that was only running on the server that read and modified my text files. And was using a different encoding it seems. Thanks for the help. It lead me to look in the right direction. – Scrontch Jul 17 '19 at 07:31

1 Answers1

3

In our case, it was due to a Beta feature on Windows server. Solution is to uncheck this property. Open Regional options (run intl.cpl on run window),

Select Administrative tab,

Click "Change System Locale",

"Beta: Use Unicode UTF-8 for worldwide language support" uncheck this. (restart required after change)

Asaf Pala
  • 131
  • 3