1

I have a shell script that uses Wikidata Query Service (WDQS) to get required data. The SPARQL query that run WDQS takes input parameter language code.

Is there a way that I can check in shell script if the input language code is a valid Wikimedia language code as the first column data in below link https://www.wikidata.org/wiki/Help:Wikimedia_language_codes/lists/all

logi-kal
  • 7,107
  • 6
  • 31
  • 43
Sasidhar
  • 61
  • 1
  • 6
  • 1
    You mind sharing what **"the given string input"** is -- or am I just missing something? Are you talking about the `/etc/locale.gen` formats? (e.g. `fo_FO.UTF-8 UTF-8`)? If so, then `if grep -q "$input" /etc/locale.gen; then printf "valid locale\n"; else printf "locale not found\n"; fi` – David C. Rankin Jan 12 '18 at 23:44
  • Thank you very much. That's what I wanted. – Sasidhar Jan 12 '18 at 23:53
  • 2
    That shows you even a blind squirrel finds a nut every now and then. That was a swinging *guess* at what you needed. Glad it helped. – David C. Rankin Jan 12 '18 at 23:55
  • 1
    @DavidC.Rankin I modified the question because the previous question looks general and simple. – Sasidhar Jan 13 '18 at 00:30
  • @DavidC.Rankin, Wikimedia language codes are different from ISO 639 codes (which are used in `locale.gen` AFAIK). – Stanislav Kralin Jan 14 '18 at 11:16
  • http://wiki.bitplan.com/index.php/SPARQL#WikiMedia_Languages has a query having ISO code + Wikimedia country code + number of speakers that might be giving you another perspective. – Wolfgang Fahl Jan 15 '18 at 14:54

1 Answers1

2

These codes are possible values of wdt:P424. From the property proposal:

— Is there a big difference to ISO 639-1?
— Many of them are the same as ISO, but it is not done in a consistent way. Some language codes have two letters, some three, and a few even more. And there are also a few cases where it is completely different (als: ISO: tosk Albanian, Wikimedia: Alemannic).

You could retrieve all these codes using the following simple SPARQL query:

SELECT DISTINCT ?code { [] wdt:P424 ?code } ORDER BY ?code

Try it!

In fact, the list you have linked to is periodically generated by a bot. The full query is:

SELECT ?item ?c
(CONCAT("{","{#language:",?c,"}","}") as ?display)
(CONCAT("{","{#language:",?c,"|","en}","}") as ?displayEN)
(CONCAT("{","{#language:",?c,"|","fr}","}") as ?displayFR)
{
  ?item wdt:P424 ?c .
  MINUS{?item wdt:P31/wdt:P279* wd:Q14827288} #--exclude Wikimedia projects
  MINUS{?item wdt:P31/wdt:P279* wd:Q17442446} #--exclude Wikimedia internal stuff
}

You could:

  • paste the list of valid codes into your script, or
  • preload the list at your script startup, or
  • execute an ASK SPARQL query at every user input.

I would prefer the third option:

#!/bin/sh
echo "Enter language code:"
read code
request="curl -g -s https://query.wikidata.org/sparql?query=ASK{?lang%20wdt:P424%20\"$code\"}"

if $request | grep -q "true"; then
    echo "Valid code";
else 
    echo "Invalid code";
fi
Stanislav Kralin
  • 11,070
  • 4
  • 35
  • 58