4

I'm writing a tool for interacting with Wikidata where labels and descriptions are added to items. But I would like to validate that the language is supported before trying to add it.

So my question is how do I get a list of the allowed language codes. The documentation describes this as UserLanguageCode but gives no info on retrieving the allowed values.

I know I can get a list of all of the used languages by doing the following SQL operation on the database, but that is both slow and inefficient: SELECT DISTINCT term_language FROM wb_terms.

As an aside is the list of allowed languages the same for MonolingualText statements?

Lokal_Profil
  • 384
  • 1
  • 13
  • 1
    Are those the same as [`action=query&meta=siteinfo&siprop=languages`](https://www.wikidata.org/w/api.php?action=help&modules=query%2Bsiteinfo)? – Bergi Sep 30 '17 at 21:13
  • 1
    Per https://www.mediawiki.org/wiki/API:Siteinfo those are the UI languages for MediaWiki. It is unclear whether this is the same list as that of allowed label languages. It is not the same as the allowed MonolingualText languages though. E.g. `nl-informal` appears there and is allowed for labels but not for MonolingualText. – Lokal_Profil Sep 30 '17 at 21:48
  • See also https://stackoverflow.com/a/48240614/7879193 – Stanislav Kralin Apr 08 '18 at 17:51

2 Answers2

4

There is now an API for getting the supported content languages (API sandbox):

https://www.wikidata.org/w/api.php?action=query&meta=wbcontentlanguages&wbclcontext=term&format=json&formatversion=2

By default it just returns the language code, but you can add the name and/or autonym (name in that language) via the wbclprop parameter. (To control the language in which the name is returned, set the global uselang parameter.)

To get allowed monolingual text languages, set wbclcontext to monolingualtext instead of term; on Wikidata, you can also set it to term-lexicographical for all language codes supported on lexicographical data (almost but not quite identical to the term languages).

Lucas Werkmeister
  • 2,584
  • 1
  • 17
  • 31
3

User hoo on IRC channel #wikidata found this solution:

Get the JSON payload at this address:

https://www.wikidata.org/w/api.php?action=paraminfo&modules=wbsetlabel

And extract

 modules[0].parameters[8].type

There are indeed less languages in this list than all the UI languages for MediaWiki.

pintoch
  • 2,293
  • 1
  • 18
  • 26
  • 1
    Thanks. This worked for getting the allowed languages for `label` and `alias`. It seems as though the list for allowed languages for `MonolingualText` is different and not accessible today. Small note. In case the order of the parameters change I would probably iterate over them and then select the `type` corresponding to `'name' == 'language'` – Lokal_Profil Dec 06 '17 at 14:56
  • 1
    As per https://meta.stackoverflow.com/questions/335658 I've changed the accepted answer now that the API finally supports these requests. – Lokal_Profil Feb 28 '19 at 23:17