2

How can I get all the labels and aliases for a given country, in both English and the official language(s) of this country, as a single column, from wikidata?

I initially tried the following but UNION seems to drop the labels in the official language:

Attempt #1:

SELECT
  ?country_iso
  (GROUP_CONCAT(DISTINCT(?label) ; separator = ", ") AS ?labels)
WHERE
{
  VALUES ?country { wd:Q878 } # FOR TESTING PURPOSES, TO REMOVE ONCE ISSUE RESOLVED.
                              # country  sovereign state  unincorporated territory
  VALUES ?any_kind_of_country { wd:Q6256 wd:Q3624078      wd:Q783733 }
  ?country wdt:P31 ?any_kind_of_country . hint:Prior hint:runFirst true .
  ?country wdt:P297 ?country_iso ; # ISO 3166-1 alpha-2 code
           wdt:P37 ?official_lang .
  ?official_lang wdt:P424 ?wm_lang_code .
  { ?country rdfs:label ?label . FILTER(LANG(?label) = "en") }
  UNION
  { ?country rdfs:label ?label . FILTER(LANG(?label) = ?wm_lang_code) }
  UNION
  { ?country skos:altLabel ?label . FILTER(LANG(?label) = "en") }
  UNION
  { ?country skos:altLabel ?label . FILTER(LANG(?label) = ?wm_lang_code) }
}
GROUP BY ?country_iso
ORDER BY ?country_iso
| country_iso | labels                                                                                                           |
|-------------|------------------------------------------------------------------------------------------------------------------|
| AE          | United Arab Emirates, ae, , Emirates, the Emirates, the U.A.E., the UAE, the United Arab Emirates, U.A.E., UAE |

If I use two separate variables ?label and ?altLabel and OPTIONAL, then I seem to get all the labels and aliases, but:

  • a. I'm not convinced || is right there, although it seems to produce the right results,
  • b. two columns (?labels and ?altLabels) isn't ideal in my case. (And if I only use one, then only rdfs:label gets returned.)

Attempt #2

SELECT
  ?country_iso
  (GROUP_CONCAT(DISTINCT(?label) ; separator = ", ") AS ?labels)
  (GROUP_CONCAT(DISTINCT(?altLabel) ; separator = ", ") AS ?altLabels)
WHERE
{
  VALUES ?country { wd:Q878 }
                              # country  sovereign state  unincorporated territory
  VALUES ?any_kind_of_country { wd:Q6256 wd:Q3624078      wd:Q783733 }
  ?country wdt:P31 ?any_kind_of_country . hint:Prior hint:runFirst true .
  ?country wdt:P297 ?country_iso ; # ISO 3166-1 alpha-2 code
           wdt:P37 ?official_lang .
  ?official_lang wdt:P424 ?wm_lang_code .
  OPTIONAL { ?country rdfs:label ?label . FILTER(LANG(?label) = "en" || LANG(?label) = ?wm_lang_code) }
  OPTIONAL { ?country skos:altLabel ?altLabel . FILTER(LANG(?altLabel) = "en" || LANG(?altLabel) = ?wm_lang_code) }
}
GROUP BY ?country_iso
ORDER BY ?country_iso
| country_iso | labels                                         | altLabels                                                                                                                           |
|-------------|------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------|
| AE          | United Arab Emirates, الإمارات العربية المتحدة | الإمارات, دولة الإمارات العربية المتحدة, ae, , Emirates, the Emirates, the U.A.E., the UAE, the United Arab Emirates, U.A.E., UAE |

What am I missing? Thank you very much in advance for your help!

Disclaimer: relatively new to SPARQL (or rather, very rusty).

Marc Carré
  • 1,446
  • 13
  • 19
  • I wonder if it could be an issue with your query looking for the UAE and Arabic being right-to-left and English left-to-right. Your query seems to work just fine for other countries, see https://w.wiki/5SWw – Valerio Cocchi Jul 13 '22 at 09:59
  • The same seems to happen to Israel, where the official language is Hebrew, also read right-to-left: https://w.wiki/5SXH and https://w.wiki/5SXK – Valerio Cocchi Jul 13 '22 at 10:14
  • It's an issue with the filter eval in the UNION it looks like. – UninformedUser Jul 13 '22 at 10:21
  • MY suggestion, use a property path: `?country rdfs:label|skos:altLabel ?label . FILTER(LANG(?label) = "en" || LANG(?label) = ?wm_lang_code)` like https://w.wiki/5SXg – UninformedUser Jul 13 '22 at 10:21
  • @ValerioCocchi, thanks for looking into this. If I run the 2nd query (using `OPTIONAL`) against France, Russia, and the USA, like you did with the 1st query, I do get more results even for these languages. See also: https://w.wiki/5Sq9 Therefore, it seems like the issue is elsewhere. – Marc Carré Jul 14 '22 at 10:32
  • @UninformedUser, thanks for looking into this. It does seem to do the trick! Thank you so much! https://w.wiki/5SqF Would you the reason? I'd like to understand why. – Marc Carré Jul 14 '22 at 10:39
  • @UninformedUser, I've posted a solution for posterity, but unsure how to attribute you the reward. If you'd like to post a solution with the same content, I'll happily delete mine and mark yours as accepted. – Marc Carré Jul 18 '22 at 20:03

1 Answers1

1

Thanks to @UninformedUser (see comments) I was able to come up with this query, yielding 206 results:

SELECT
  ?country_iso
  (GROUP_CONCAT(DISTINCT(?label) ; separator = " | ") AS ?labels)
WHERE
{
                              # country  sovereign state  unincorporated territory
  VALUES ?any_kind_of_country { wd:Q6256 wd:Q3624078      wd:Q783733 }
  ?country wdt:P31 ?any_kind_of_country . hint:Prior hint:runFirst true .
  ?country wdt:P297 ?country_iso ; # ISO 3166-1 alpha-2 code
           wdt:P37 ?official_lang .
  ?official_lang wdt:P424 ?wm_lang_code .
  ?country rdfs:label|skos:altLabel ?label . FILTER(LANG(?label) = "en" || LANG(?label) = ?wm_lang_code)
}
GROUP BY ?country_iso
ORDER BY ?country_iso

P.S.: An extension of this work, also fetching emoji flags is that query.

Marc Carré
  • 1,446
  • 13
  • 19