0

The following SPARQL queries Wikidata for all European countries (sovereign states, Q3624078) and loads for each country the last few known population numbers (P:1082 population):

SELECT DISTINCT ?item $itemLabel ?population ?popDate
  WHERE {
    ?item wdt:P31 wd:Q3624078.                                #select only sovereign states

    ?item wdt:P706/wdt:P361*|wdt:P361*|wdt:P30 wd:Q46.        #select items which are geographically in Europe

    ?item p:P31 ?statement.
    ?statement ps:P31 wd:Q3624078.
    FILTER NOT EXISTS { ?statement pq:P582 ?end. }            #filter out items which are not sovereign states anymore
    FILTER NOT EXISTS { ?item wdt:P31 wd:Q3024240. }          #filter out historical countries

    OPTIONAL {
      ?item p:P1082 ?popStatement.
      ?popStatement ps:P1082 ?population;
                    pq:P585 ?popDate.
      FILTER ( YEAR(?popDate) >= 2014 )
    }
    SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
  }
  ORDER BY ASC($itemLabel)

You can test it here.

How can I limit the population data to the most recent population number?

I tried it by limiting it to a specific year (FILTER (YEAR(?popDate) = 2018), but this does not work because some countries don't have a population estimate for 2018.

This is similar to an SQL query like SELECT * FROM country c INNER JOIN population p ON p.CountryId = c.Id. In SQL, you would need to create a subquery with a MAX statement to find the most recent population data by country. How can I achieve the same in SPARQL?

logi-kal
  • 7,107
  • 6
  • 31
  • 43
Jonas Sourlier
  • 13,684
  • 16
  • 77
  • 148
  • 2
    It's basically the same, you can use a subselect, e.g. just replace your `OPTIONAL` clause by this: `OPTIONAL { ?item p:P1082 ?popStatement. ?popStatement ps:P1082 ?population; pq:P585 ?popDate. FILTER ( ?popDate = ?recentDate ) { select ?item (MAX(?popDate) as ?recentDate) { ?item wdt:P31 wd:Q3624078. OPTIONAL { ?item p:P1082 ?popStatement. ?popStatement ps:P1082 ?population; pq:P585 ?popDate. FILTER ( YEAR(?popDate) >= 2014 ) } } group by ?item } }` – UninformedUser May 05 '19 at 15:11
  • 1
    the subquery is returning the latest date by country. I didn't do the other FILTER stuff inside the subquery as this will be done in the outer query anyways. I hope this works as you expect, I did not check the result – UninformedUser May 05 '19 at 15:12
  • @AKSW works great, thank you! Just one question: you're using the same variable names `?popDate`, `?popStatement` and `?population` inside the subquery. Why does that work? If they are the same variables, and the "inner" variables run over all the population entries, why are the "outer" variables limited to the most recent data? – Jonas Sourlier May 05 '19 at 19:29
  • Ah I see, the subquery variables don't are completely separate, even if we're using the same variable names (this is different in SQL). Right? – Jonas Sourlier May 05 '19 at 19:36
  • The only "overlapping" variable seems to be `?item`, which establishes a binding between the subquery and the outer query (like the `ON` clause in an SQL `JOIN`). – Jonas Sourlier May 05 '19 at 19:51
  • @AKSW If you copy your comment to an answer, I'm gonna accept and upvote it. – Jonas Sourlier May 05 '19 at 20:56

0 Answers0