The following SPARQL queries Wikidata for all European countries (sovereign states, Q3624078) and loads for each country the last few known population numbers (P:1082 population):
SELECT DISTINCT ?item $itemLabel ?population ?popDate
WHERE {
?item wdt:P31 wd:Q3624078. #select only sovereign states
?item wdt:P706/wdt:P361*|wdt:P361*|wdt:P30 wd:Q46. #select items which are geographically in Europe
?item p:P31 ?statement.
?statement ps:P31 wd:Q3624078.
FILTER NOT EXISTS { ?statement pq:P582 ?end. } #filter out items which are not sovereign states anymore
FILTER NOT EXISTS { ?item wdt:P31 wd:Q3024240. } #filter out historical countries
OPTIONAL {
?item p:P1082 ?popStatement.
?popStatement ps:P1082 ?population;
pq:P585 ?popDate.
FILTER ( YEAR(?popDate) >= 2014 )
}
SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}
ORDER BY ASC($itemLabel)
You can test it here.
How can I limit the population data to the most recent population number?
I tried it by limiting it to a specific year (FILTER (YEAR(?popDate) = 2018
), but this does not work because some countries don't have a population estimate for 2018.
This is similar to an SQL query like SELECT * FROM country c INNER JOIN population p ON p.CountryId = c.Id
. In SQL, you would need to create a subquery with a MAX statement to find the most recent population data by country. How can I achieve the same in SPARQL?