26

Should I use ISO 639-1 (2-letter abbreviation) or ISO 639-2 (3 letter abbrv) to store a user's language code? Both are official standards, but which is the de facto standard in the development community? I think ISO 639-1 would be easier to remember, and is probably more popular for that reason, but thats just a guess.

The site I'm building will have a separate site for the US, Brazil, Russia, China, & the UK.

http://en.wikipedia.org/wiki/ISO_639

hippietrail
  • 15,848
  • 18
  • 99
  • 158
John Himmelman
  • 21,504
  • 22
  • 65
  • 80
  • 4
    Don't confuse "language" and "geographical location". – Quentin Mar 24 '10 at 20:23
  • I believe we're only creating translations for the most common spoken language in each of those countries. I need to make sure I'm using the correct language code, because it will affect the translation file names. – John Himmelman Mar 24 '10 at 20:27
  • 2
    You should use: `en`, `pt`, `ru`, `zh`, `en-gb` codes and do not forget to check my answer for full explanation.. – sorin Apr 09 '10 at 15:33
  • Projects I've been involved with, including Wiktionary and AbiWord used 2-letter codes for languages that had two-letter codes and three-letter codes otherwise. – hippietrail Feb 05 '14 at 19:21

5 Answers5

30

You should use IETF language tags because they are already used for HTTP/HTML/XML and many other technologies. They are based on several standards including the ISO-639 collection (yes language, region and culture selection are not so simple to define).

I wrote a more detailed article regarding the proper language code selection and usage. The idea is to use the simplest/shorter ISO-639-1 codes and specify more only for special cases. Inside the article there are codes for ~30 most used languages with reasons why I consider one alternative better than another.

In case you want to skip reading the entire article here is a short list of language codes (not to be confused with country codes): ar, cs, da, de, el, en, en-gb, es, fr, fi, he, hu, it, ja, ko, nb, nl, pl, pt, pt-pt, ro, ru, sv, tr, uk, zh, zh-hant

The following points may not be obvious but should be borne in mind:

  • en is used for en-us - American English, and for British English is used en-gb
  • pt is used for pt-br, and not pt-pt witch has much less speakers
  • zh is used instead of zh-hans, zh-CN,...
  • zh-hant (Traditional Chinese) is used instead of more specific codes like zh-hant-TW or zh-TW

You can find more explanations inside the article.

sorin
  • 161,544
  • 178
  • 535
  • 806
  • 2
    The [proper language code selection and usage](http://blog.i18n.ro/using-the-proper-language-codes/) URL is throwing a 404. Try to update it or have the relevant information added to your answer. **Ps:** Nice answer, +1. – Zuul Jul 09 '12 at 12:56
  • > 404 Not Found – Nato Boram Nov 04 '19 at 17:06
7

I would go with a derivative of ISO 639. Specifically I like to use this: http://en.wikipedia.org/wiki/IETF_language_tag

Ben
  • 136
  • 3
2

I'm no expert, but every site I've ever seen uses ISO 639-1, including the current site I'm working on.

It works for us!

Chuck Le Butt
  • 47,570
  • 62
  • 203
  • 289
  • +1 I've never seen 639-2 used in any application. Indeed with the presence of collection codes like "cpe" you could wind up encoding documents that are - in fact - readable by no one. And how many documents in Cree do you really expect? – msw Mar 24 '10 at 20:30
1

I've only ever seen 2-character language codes in use - so I'd recommend going with them unless your work involves delving into linguistics in some way. If all you're doing is customizing the browsing experience for the world at large, you won't need the extra repertoire offered by 3-character codes.

Jonathan Leffler
  • 730,956
  • 141
  • 904
  • 1,278
1

ISO 639-1 Alpha-2 are used pretty much universally.

They are used for example in HTTP content negotiation. If you ever wondered how an international website can automatically show you their homepage in your native language, that's how it works. (Although it's sometimes kinda annoying. I, for example, often get shown the default Apache homepage in German, because the webmaster turned on content negotiation, but only put content for English in.)

Most web browsers use them directly in their settings dialog box.

Most operating systems use them in their settings dialog boxes or configuration files.

Wikipedia uses them in their server names for the different language versions.

In other words: if your users aren't native English speakers, they will probably already have encountered them when configuring their software, because otherwise they wouldn't be able to use their computers.

The other members of the ISO 639 family are mostly of interest to linguists. Unless you expect Jesus Christ himself (ISO 639-2 Alpha-3 code arc) to visit your website, or maybe Klingons (tlh), ISO 639-1 has more languages than you ever can hope to support.

Jörg W Mittag
  • 363,080
  • 75
  • 446
  • 653
  • 1
    It may be true that 639-1 covers all the languages that are *commercially interesting*. But there are thousands of languages not covered by that list, and these languages have schools and books and their speakers are coming online. Please think twice before adding to the struggles of small languages by not allowing their codes when they show up online. – John Hatton Feb 11 '15 at 15:27
  • I wouldn't bother with ISO 639-2; it has basically been superseded by ISO 639-3. – Tsundoku Aug 15 '16 at 16:34