Match language code with countries where this language is an official or commonly used language

Question

Is there any python library to get a list of countries for a specific language code where it is an official or commonly used language?

For example, language code of "fr" is associated with 29 countries where French is an official language plus 8 countries where it's commonly used.

Anentropic · Answer 1 · 2020-07-29T15:49:16.420

Despite the accepted answer, as far as I can tell none of the xml files underlying pycountry contains a way to map languages to countries. It contains lists of languages and their iso codes, and lists of countries and their iso codes, plus other useful stuff, but not that.

Similarly, the Babel package is great but after digging around for a while I couldn't find any way to list all languages for a particular country. The best you can do is the 'most likely' language: https://stackoverflow.com/a/22199367/202168

So I had to get it myself...

import lxml.etree
import urllib.request

def get_territory_languages():
    url = "https://raw.githubusercontent.com/unicode-org/cldr/master/common/supplemental/supplementalData.xml"
    langxml = urllib.request.urlopen(url)
    langtree = lxml.etree.XML(langxml.read())

    territory_languages = {}
    for t in langtree.find('territoryInfo').findall('territory'):
        langs = {}
        for l in t.findall('languagePopulation'):
            langs[l.get('type')] = {
                'percent': float(l.get('populationPercent')),
                'official': bool(l.get('officialStatus'))
            }
        territory_languages[t.get('type')] = langs
    return territory_languages

You probably want to store the result of this in a file rather than calling across the web every time you need it.

This dataset contains 'unofficial' languages as well, you may not want to include those, here's some more example code:

TERRITORY_LANGUAGES = get_territory_languages()

def get_official_locale_ids(country_code):
    country_code = country_code.upper()
    langs = TERRITORY_LANGUAGES[country_code].items()
    # most widely-spoken first:
    langs.sort(key=lambda l: l[1]['percent'], reverse=True)
    return [
        '{lang}_{terr}'.format(lang=lang, terr=country_code)
        for lang, spec in langs if spec['official']
    ]

get_official_locale_ids('es')
>>> ['es_ES', 'ca_ES', 'gl_ES', 'eu_ES', 'ast_ES']

I'm unable to reach the provided "xml". Could you, please, give me any suggestions on how can I download it? — Nuzhdin Vladimir, Jul 29 '20 at 15:24
looks like the url has changed, I've updated the answer with the new url (and for Python 3) — Anentropic, Jul 29 '20 at 15:49

score 8 · Answer 2 · answered Jun 05 '10 at 23:38

8

Look for the Babel package. It has a pickle file for each supported locale. See the list() function in the localedata module for getting a list of ALL locales. Then write some code to split the locales into (language, country) etc etc

answered Jun 05 '10 at 23:38

John Machin

81,303
11
141
189

1

It's really easy using `babel.languages.get_territory_language_info()` – Rmatt Jan 03 '17 at 17:41
@Rmatt It's amazing how much more a package can become easier to use in six years :-) – John Machin Jan 03 '17 at 21:47
Sure, this is why I also upvoted your answer! You brought a decent path, just made it more precise for newcomers ;) – Rmatt Jan 06 '17 at 14:31
@Rmatt, you should add this as an answer. This is by far the easiest method! – Noah Santacruz Jun 10 '21 at 08:29

score 1 · Answer 3 · answered Jun 11 '21 at 11:42

As requested by @NoahSantacruz, I add this as a separate answer to make it easier to pick it up. At least since 2017, the easiest method from far is:

babel.languages.get_territory_language_info()

See the docs http://babel.pocoo.org/en/latest/api/languages.html#babel.languages.get_territory_language_info

score -1 · Answer 4 · edited Dec 02 '16 at 18:49

-1

Check out Ethnologue

Be careful though...

India has a lot of official languages.

edited Dec 02 '16 at 18:49

Jed Fox

2,979
5
28
38

answered Jul 23 '10 at 19:27

NinjaCat

9,974
9
44
64

score -2 · Accepted Answer · answered Apr 21 '10 at 06:02

-2

pycountry (seriously). You can get it from the Package Index.

answered Apr 21 '10 at 06:02

doug

69,080
24
165
199

7

I just had a look at the documentation for it, and it doesn't seem like you can provide a language code, and get a list of all the countries that use that language – a_m0d Apr 21 '10 at 06:08
might be worth checking again--the reason i say that is because I used this package for a similar purpose (currencies)--*but* i wasn't able to use the interface. Instead i had to work directly with the five XML databases provided in the package. – doug Apr 21 '10 at 06:25

Match language code with countries where this language is an official or commonly used language

5 Answers5