I'm posting this as an answer to two questions that are almost asked in the OP's question-post:
After Windows 10's breaking changes to support BCP-47
...
- How can I tell if a given
CultureInfo
object is a "real" Culture, or a fake/contrived/private CultureInfo
created from-scratch in code?
- How can I tell if a user-supplied
String cultureName
value is valid for new CultureInfo(String)
and that the runtime environment (.NET and/or OS) has meaningful culture data for that name (more than just the DisplayName
)?
Question 1: Validating a given CultureInfo
instance:
As per the documentation for CultureTypes
, prior to Windows 10, if the CultureInfo.CultureTypes
property has the flag UserCustomCulture
then it was a custom culture. Since Windows 10, the UserCustomCulture
flag indicates custom cultures, but also "system cultures that are not backed by a complete set of cultural data and that do not have unique local identifiers".
So if you want to validate a CultureInfo
on Windows 10 identically as though it were on Windows 8.1 or earlier, just check that:
- The
CultureInfo.CultureTypes
does not have the CultureTypes.UserCustomCulture
flag set.
- If it does have
UserCustomCulture
, ensure CultureInfo.ThreeLetterWindowsLanguageName != "ZZZ"
- The
"ZZZ"
magic-string seems to be within Windows itself, and it only appears on Windows 10 or later.
- .NET Core's own test-cases includes a test for it, but never explains it beyond the comment "
.GetThreeLetterWindowsLanguageName(cultureName) ?? "ZZZ" /* default lang name */;
".
So this works for me:
public static Boolean ValidateCultureInfoWithPreWindows10Logic( CultureInfo ci )
{
Boolean hasUserCustom = ( ci.CultureTypes & CultureTypes.UserCustomCulture ) == CultureTypes.UserCustomCulture;
if( hasUserCustom )
{
if( ci.ThreeLetterWindowsLanguageName == "ZZZ" )
{
// Windows doesn't have a name for this language - this CultureInfo is invalid under Windows 8.1 or earlier.
return false;
}
else
{
// The `UserCustomCulture` flag means *some* CultureData is missing, but not enough to make them useless.
// On both Win8 and Win10, the same 8 Neutral Cultures match here: [ jv, jv-Latn, mg, nqo, sn, sn-Latn, zgh, zgh-Tfng ]
return true;
}
}
else
{
// The `UserCustomCulture` flag is not set, which means 100% of the CultureInfo's CultureData exists in the system.
return true;
}
}
Question 2: Validating a given String cultureName
:
Remember that a culture-name is hierarchical, with 3 main levels:
Invariant
= CultureInfo.InvariantCulture
.
Neutral
= a language-name without a region, e.g. en
, fr
, etc.
Specific
= a language-name for a specific region, e.g. en-US
, en-GB
, fr-CA
, fr-FR
.
- Additionally there are some names for sub-specific-regions, e.g.
ca-ES-valencia
. I've never encountered more than 3 levels of depth, though.
Validating a cultureName
depends on what your business/domain/application requirements are:
- If you want to require the name to match an OS-known language and region, then it's sufficient to do
ValidateCultureInfoWithPreWindows10Logic( new CultureInfo( cultureName ) )
(after validating that the format of cultureName
complies with BCP-47, of course).
- If you want to require the name to match an OS-known language, but allow any OS-known region to be specified, even if the OS doesn't have Specific CultureData for it (e.g. when using
CultureInfo.CreateSpecificCulture("en-FR")
) then checking ci.ThreeLetterWindowsLanguageName != "ZZZ"
is sufficient.
- If you want to require the name to match an OS-known language, but allow any region to be specified, even if the OS doesn't even know about the region, then it's complicated...
Here's a table showing results of new CultureInfo
vs CultureInfo.CreateSpecificCulture
on Windows 10 vs. Server 2012 R2, and .NET 4.8 vs .NET 6:
Expression |
Windows 10 + .NET 6 |
Windows 10 + .NET 4.8 |
Windows 2012 R2 + .NET 4.8 |
CultureInfo ci1 = new CultureInfo("en-FR") |
|
|
|
ci1.DisplayName |
"English (France)" |
"Unknown Locale (en-FR)" |
CultureNotFoundException |
ci1.ThreeLetterWindowsLanguageName |
"ZZZ" |
"ENU" |
CultureNotFoundException |
ci1.CultureTypes |
SpecificCultures | UserCustomCulture | InstalledWin32Cultures |
SpecificCultures | UserCustomCulture |
CultureNotFoundException |
|
|
|
|
CultureInfo spec = CultureInfo.CreateSpecificCulture("en-FR") |
|
|
|
spec.DisplayName |
"English (France)" |
"Unknown Locale (en-FR)" |
"English (United States)" |
spec.ThreeLetterWindowsLanguageName |
"ZZZ" |
"ENU" |
"ENU" |
spec.CultureTypes |
SpecificCultures | UserCustomCulture | InstalledWin32Cultures |
SpecificCultures | UserCustomCulture |
SpecificCultures | InstalledWin32Cultures | FrameworkCultures |
So far, so very inconsistent.
If you want to allow arbitrary language names, even if the OS doesn't know about the language (let alone the region) - be it Neutral or Specific CultureInfo... uhh... I'll have to answer that question later.
Other tips: How to reliably validate cultureName
when you want it restricted to OS-supported cultures (Neutral and/or Specific):
A quick-fix is to have this:
public static class KnownCultureInfoNameValidator
{
private static readonly HashSet<String> _preWindows10BuiltInCustomNames = new String[]
{
"jv", "jv-Latn", "mg", "nqo", "sn", "sn-Latn", "zgh", "zgh-Tfng"
}
.ToHashSet();
private static readonly HashSet<String> _knownLanguages = BuildHashSet( CultureInfo.GetCultures( CultureTypes.NeutralCultures ) );
private static readonly HashSet<String> _knownSpecific = BuildHashSet( CultureInfo.GetCultures( CultureTypes.SpecificCultures ) );
private static HashSet<String> BuildHashSet( IEnumerable<CultureInfo> cultures )
{
return cultures
.Where( ci => ci.ThreeLetterWindowsLanguageName != "ZZZ" )
.Where( ci => ci.LCID != 127 ) // Exclude InvariantCulture
#if LIKE_PRE_WINDOWS_10
.Where( ci =>
_preWindows10BuiltInCustomNames.Contains( ci.Name )
||
( ci.CultureTypes & CultureTypes.UserCustomCulture ) == 0
)
#endif
.Select( ci => ci.Name )
.ToHashSet();
}
// Only returns true if `cultureName` is an OS-known culture with sufficient OS-provided culture data. This method will return false for partially-known cultuires.
public static Boolean ValidateCultureName( String cultureName, Boolean allowNeutral, Boolean allowSpecific )
{
if( allowNeutral && _knownLanguages.Contains( cultureName ) ) return true;
if( allowSpecific && _knownSpecific.Contains( cultureName ) ) return true;
return false;
}
}
Research:
I've been pouring over the internals of .NET's CultureInfo
and (internal) CultureData
, here's my findings:
When a new CultureData
instance is created using any of the String name
constructors (including internally), a new empty CultureData
object is created, and then its sRealName
and bUseOverrides
fields set with the earlier cultureName
and useUserOverride
values (respectively) from the CultureInfo
's constructor call-site.
This CultureData
is then passed into a function nativeInitCultureData
that's internal to the .NET CLR runtime (i.e. MethodImplOptions.InternalCall
).
So we can't use CultureInfo.LCID
because that's now 4096 == 0x1000
for system-provided, but partial, CultureInfo
objects - just as it is for "fake" CultureInfo
objects.
We can't use CultureInfo.CompareInfo.LCID
either, because there's still a lot of "real" (but also incomplete) system-provided cultures with 0x1000
there, such as
So because Windows 10 now always returns a non-null
string value for sWindowsName
whenever any BCP-47-compliant input cultureName
is used, that's why there's no instant way to detect "fake" vs. "real" CultureInfo
objects in .NET.
So that means there's now only 2 ways to check if a given CultureInfo
is "fake" vs. "real":
- Option 1: During program start-up, build your own private immutable
HashSet<String>
of CultureInfo
names from CultureInfo.GetCultures
and use that to validate, see KnownCultureInfoNameValidator
above.
- Option 2: Check if
CultureInfo.EnglishName
starts with "Unknown Locale"
and/or CultureInfo.Parent.EnglishName
starts with "Unknown Language"
.
- While it always feels wrong to do compare magic strings, especially human-readable strings, at least
EnglishName
is always English and won't break if a user is running a non-English build of Windows, unlike with Exception.Message
, for example.
- There just doesn't seem to be any other documented hints or values that hint if a
CultureInfo
's data is really system-provided or not. None of the other non-String
members of CultureData
seem to go with it.
- Be sure to use a
String.StartsWith
check, not String.Equals
, due to the parenthesized CultureName
at the end.
- I did initially think that
ThreeLetterWindowsLanguageName == "ZZZ"
might work, but on my computer the CultureInfo.GetCultures
method returns 114 neutral cultures and 326 specific cultures with "ZZZ"
values for that property, erk.