Which standard language codes should I use for multilingual software?

Question

I often see the abbreviation "en-US", which corresponds with the 2-character language codes standardized in ISO639-1. I also understand that the format of language tags generally consists of a primary language (subtag) code, followed by a series of other subtags separated by dashes, as explained in https://www.rfc-editor.org/rfc/rfc5646.

That link mentions that there are also 3-letter language codes defined in ISO639-2, ISO639-3, and ISO639-5.

Still, there are more codes defined for Windows/.NET here: http://msdn.microsoft.com/en-us/goglobal/bb896001.aspx. These refer to the language tags as "culture names", and use a distinct 3-character code for "language name". So the "culture name" appears to be the 2-character language codes, although I'm not sure why they vary between Windows versions, or how well they follow the standard language codes. Is "en-US" really a "language code" or is it a "culture name"?

If I'm developing software to use language codes, which standard should I use? (The 2-character codes or the 3-character codes? If 3-character, then ISO639- 2, 3, or 5?)

Why should I chose one over the other? (For OS platform or programming framework compatibility?)

Without specific criteria or purpose defined, this looks like an opinion poll. — Jukka K. Korpela, Jan 17 '13 at 18:49
It's not an opinion poll, because I specifically ask "why" one should be chosen over the other. For example, the best choice may depend on the platform. I don't know. If that's the case, then a definitive answer such as "The choice depends on the platform and for platform X, this scheme should be used, because..." would be a valid, non-opinionated answer, from a experienced developer. There also may be valid, logical reasons or specific purposes for using 2 vs. 3-letter codes in general, e.g. scope of languages to be covered, etc. I'm looking for facts and logical reasons, not opinions. — Triynko, Jan 17 '13 at 19:02
I'm not asking "which is better, 2 or 3-character codes", because that would be opinionated. I'm asking "when would I use 2 vs. 3-character codes and why" (eliciting factual or logical information). I'm also interested in whether "culture name" in the Windows API context is actually just a 2-letter "language code". The most clear and comprehensive response will be marked as the answer. — Triynko, Jan 17 '13 at 19:11

score 3 · Accepted Answer · answered Jan 17 '13 at 19:11

Bcp47 is the industry best practice standard for identifying languages. You should use these language tags. Bcp47 dictates that if a language can be identified using a 2 letter or 3 letter tag, the 2 letter tag should be used.

Cultures and locales are distinct from language tags in how they conceive of the region information. The region information in a language tag identifies the origin of the particular dialect (en-US is American English or the variety of English that originated in the United States), the region information in a locale identifies the location where the information is relevant. Since the majority of American English speakers also live in the US, the distinction is not really important when it comes to providing information such as how to spell words or format dates or numbers.

Windows is moving away from the concept of a locale or culture to a more expressive notion of language and region (separately identified) which allows us to identify situations such as a speaker of American English who resides in England.

Note that there are cases where Windows still uses legacy names that predate this standard and depending on how you rely on the OS, you may need to map between standard compliant names and the legacy name.

Thanks, this is very helpful, especially the best practices document. I have some kids using some reading comprehension software in Texas who speak "Border Spanish". I'm not sure what exactly that dialect corresponds to, but since they are living in the "US" region close to Mexico, I will probably use either "es-US", "es-MX", or possibly a custom/private identifier such as "x-es-BS". How does that sound? This is the first step in making our software support multiple languages, and although the client is Flash-based, our server-side code is all Microsoft products (C#/.NET/SqlServer). — Triynko, Jan 17 '13 at 19:39
I would use es-US. If you were to use a private identifier, it would be something like es-x-BS. — Eric MSFT, Jan 18 '13 at 04:05

Which standard language codes should I use for multilingual software?

1 Answers1