5

I am seeing a different Unicode character as the number group separator for the "de-CH" culture when running on a local desktop and in Azure.

When the following code is run on my desktop in .NET Core 3.1 or .NET Framework 4.7.2 it outputs 2019 which looks like an apostrophe but is not the same.

When run in Azure, for instance in https://try.dot.net or (slightly modified) in an Azure function running on .NET Core 3.1 (on a Windows based App Service) it results in 0027, a standard ASCII apostrophe.

using System;
using System.Linq;
using System.Globalization;

Console.WriteLine(((int)(CultureInfo
    .GetCultureInfo("de-CH")
    .NumberFormat
    .NumberGroupSeparator
    .Single())) // Just getting the single character as an int
    .ToString("X4") // unicode value of that character
    );

The result of this is that trying to parse the string 4'200.000 (where the apostrophe there is Unicode 0027) on local desktop using "de-CH" culture fails, but it works in Azure.

Why the difference?

Pac0
  • 21,465
  • 8
  • 65
  • 74
Josh Gallagher
  • 5,211
  • 2
  • 33
  • 60
  • 3
    \u0027 is APOSTROPHE, \u2019 is RIGHT SINGLE QUOTATION MARK. This might be a more appropriate question for Azure Support. (FWIW my Win10 desktop also reports \u2019 for de-CH.) – Ian Kemp Aug 19 '20 at 15:20
  • After testing, if possible, it is recommended to use linux, the return value under linux is the same, all are 2019. – Jason Pan Sep 04 '20 at 09:30
  • Hope my answer can help you. – Jason Pan Sep 04 '20 at 09:31
  • @JasonPan Thanks for the tip about Linux being \u2019. Unfortunately it would appear that we're seeing \u0027 coming through in real documents that we're parsing, so the behaviour on Windows in Azure is the desirable one. I think the workaround I mentioned in another comment (replacing either with the thousands separator on the runtime platform prior to parsing) will need to be employed. But that doesn't explain why there's an inconsistency. – Josh Gallagher Sep 04 '20 at 12:35
  • 1
    According to this discussion, the change happened with the Windows 10 1709 update https://stackoverflow.com/questions/48498682/windows-culture-settings-apostrophe-vs-right-single-quotation-mark – NineBerry Sep 04 '20 at 13:08
  • @NineBerry that is a good clue, thank you. So I wonder if the Windows OS that Azure Functions (at least) is running on is from prior to that setting changing. In which case I could expect our code to start breaking in Azure soon too. – Josh Gallagher Sep 04 '20 at 15:36

1 Answers1

0

This Microsoft blog by Shawn Steele explains why you shouldn't rely on a specific culture setting being stable (Fully quoted because it is no longer online at MSDN):

https://web.archive.org/web/20190110065542/https://blogs.msdn.microsoft.com/shawnste/2005/04/05/culture-data-shouldnt-be-considered-stable-except-for-invariant/

CultureInfo and RegionInfo data represents a cultural, regional, admin or user preference for cultural settings. Applications should NOT make any assumptions that rely on this data being stable. The only exception (this is a rule, so of course there's an exception) is for CultureInfo.InvariantCulture. CultureInfo.InvariantCulture is supposed to remain stable, even between versions.

There are many reasons that cultural data can change. With Whidbey and Custom Cultures the list gets a little longer.

  • The most obvious reason is that there is a bug in the data and we had to make a change. (Believe it or not we make mistakes ;-)) In this case our users (and yours too) want culturally correct data, so we have to fix the bug even if it breaks existing applications.
  • Another reason is that cultural preferences can change. There're lots of ways this can happen, but it does happen:
    • Global awareness, cross cultural exchange, the changing role of computers and so forth can all effect a cultural preference.
    • International treaties, trade, etc. can change values. The adoption of the Euro changed many countries currency symbol to €.
    • National or regional regulations can impact these values too.
    • Preferred spelling of words can change over time.
    • Preferred date formats, etc can change.
  • Multiple preferences could exist for a culture. The preferred best choice can then change over time.
  • Users could have overridden some values, like date or time formats. These can be requested without user override, however we recommend that applications consider using user overrides.
  • Users or administrators could have created a replacement culture, replacing common default values for a culture with company specific, regional specific, or other variations of the standard data.
    • Some cultures may have preferences that vary depending on the setting. A business might have a more formal form than an Internet Café.
    • An enterprise may require a specific date format or time format for the entire organization.
  • Differing versions of the same custom culture, or one that's custom on one machine and a windows only culture on another machine.

So if you format a string with a particular date/time format, and then try to Parse it later, parse might fail if the version changed, if the machine changed, if the framework version changed (newer data), or if a custom culture was changed. If you need to persist data in a reliable format, choose a binary method, provide your own format or use the InvariantCulture.

Even without changing data, remembering to use Invariant is still a good idea. If you have different . and , syntax for something like 1,000.29, then Parsing can get confused if a client was expecting 1.000,29. I've seen this problem with applications that didn't realize that a user's culture would be different than the developer's culture. Using Invariant or another technique solves this kind of problem.

Of course you can't have both "correct" display for the current user and perfect round tripping if the culture data changes. So generally I'd recommend persisting data using InvariantCulture or another immutable format, and always using the appropriate formatting APIs for display. Your application will have its own requirements, so consider them carefully.

Note that for collation (sort order/comparisons), even Invariant behavior can change. You'll need to use the Sort Versioning to get around that if you require consistently stable sort orders.

If you need to parse data automatically that is formatted to be user-friendly, there are two approaches:

  • Allow the user to explicitly specify the used format.
  • First remove every character except digits, minus sign and the decimal separator from the string before trying to parse this. Note that you need to know the correct decimal separator first. There is no way to guess this correctly and guessing wrong could result in major problems.

Wherever possible try to avoid parsing numbers that are formatted to be user-friendly. Instead whenever possible try to request numbers in a strictly defined (invariant) format.

NineBerry
  • 26,306
  • 3
  • 62
  • 93
  • 1
    Thanks for the answer. It explains how to get around the problem that they're inconsistent but doesn't explain why the two platforms are inconsistent with the same version of the framework. Also, the danger of removing all characters that aren't the decimal separator is that this may make a string into a valid number where it wasn't a valid number in that culture beforehand and could thus result in errors being introduced. I think the only real "fix" here is to do a specific replacement of either character with the thousands separator for that culture on the system running the code. – Josh Gallagher Sep 04 '20 at 12:10
  • You should not rely on the data that is supplied. You have to be aware that even the decimal separator for the default Swiss culture HAS CHANGED from dot to comma in Windows at one point in the past for a certain period. – NineBerry Sep 04 '20 at 12:31
  • These are non-structured documents meant for human reading that we're receiving and parsing. Identifying the culture from an address on the document was working out pretty well for parsing the numbers until we came across this de-CH issue. I think we'll just have to treat \u0027 and \u2019 as synonymous for the moment and see what other quirks come up. – Josh Gallagher Sep 04 '20 at 15:40
  • 1
    Just for the record, I would always use `InvariantCulture` if I were in charge of the encoding of numbers to strings meant for software to read! – Josh Gallagher Sep 04 '20 at 15:41
  • 1
    @JoshGallagher Switzerland is really a mess in this regard because the decimal separator and group separator used is not consistent. Numbers that represent money will use dot as decimal separator, but other numbers might use comma as decimal separator instead depending on the local policy of the organization... Have a look at the [Wikipedia article on "decimal separator"](https://en.wikipedia.org/wiki/Decimal_separator) and search for Switzerland in that page... – NineBerry Sep 04 '20 at 15:43