0

I am trying to get all the city names of all countries in the world using this below query. Whenever I execute this below query it returns this message "Query timeout limit reached".

Is there any other way to get all the data before it reaches timeout limit?

SELECT ?country ?countryLabel ?city ?cityLabel
WHERE
{
  ?city wdt:P31/wdt:P279* wd:Q515;
        wdt:P17 ?country .

  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
}
ORDER BY ?country
Karuppiah RK
  • 3,894
  • 9
  • 40
  • 80
  • Unfortunately I don't think there is a way around the timeout. The alternative is to break the query up into multiple queries and write some code that will run all the queries. That is, first get all the countries and then run the above query for each country. – Henriette Harmse Aug 24 '19 at 11:53
  • @HenrietteHarmse Yeah. I have that idea too. But, sad part is I need to run the query for 220 times.. :-( – Karuppiah RK Aug 25 '19 at 03:53
  • 2
    @HenrietteHarmse I tried without the ORDER BY as philshem said in his first comment (https://opendata.stackexchange.com/questions/15641/sparql-get-all-the-data-before-it-reaches-timeout?noredirect=1#comment18580_15641). It is working fine now. – Karuppiah RK Aug 25 '19 at 04:19
  • honestly, it's weird. I know that `ORDER BY` is always expensive, but in your query there are `64255` cities, that's a tiny number to sort by country then ... for me an issue in the query optimizer – UninformedUser Aug 25 '19 at 09:57

4 Answers4

2

I am not at all sure why, but, this query works for me:

SELECT ?country ?countryLabel ?city ?cityLabel
WHERE
{
  ?city wdt:P31/wdt:P279* wd:Q515;
        wdt:P17 ?country .

  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
}
ORDER BY ?countryLabel
LIMIT 100000

The two differences from your original query are:

  1. Ordering by countryLabel is, I'm guessing, what you actually wanted instead of ordering by country. In my experience ordering by label is sometimes faster too.
  2. I set a limit number. The query appears to return results of the same length as it would without a limit, since the limit is higher than the proper number of results.
John Skiles Skinner
  • 1,611
  • 1
  • 8
  • 21
  • Thanks for this useful trick. Is there any way to exclude this kind of results `countryLabel: Q11916644` & `cityLabel: Q20108173`? If any one of the city or country label contains this `QXXXXXXXX` it should be excluded from results. – Karuppiah RK Aug 27 '19 at 03:46
  • I don't know yet, but I would like to. Ask a new question about it and I'll look into it. – John Skiles Skinner Aug 27 '19 at 14:05
  • honestly, the query does not always work. I tried right now and got a timeout. I also don't see why this query should be faster. Given the ORDER BY, the whole data has to be computed first and pushing the LIMIT into some intermediate result won't work. I think at just works sometimes by chance given that the whole endpoint is a public shared service with different load at different time. Just my two cents ... – UninformedUser Aug 27 '19 at 17:52
  • excluding labels like `Q11916644` returned from the label service is only possible via a) regex because this is the expected fallback in case there is no such label in the specified language or b) you have to get the labels the "normal" way in SPARQL – UninformedUser Aug 27 '19 at 18:04
  • I also see no reason why it should be faster. I agree that the appearance of it being faster could be chance, and indeed that it does not work every time. I guess a person would have to benchmark it to be sure. Filtering out `Qxxxx` labels seems possible with `FILTER(EXISTS { ?item schema:description ?itemdesc. FILTER(LANG(?itemdesc) = "en") })` – John Skiles Skinner Aug 27 '19 at 18:21
2

I've posted this answer on the Open Data site, based on my comment, but removing ORDER BY made the query go through.

philshem
  • 24,761
  • 8
  • 61
  • 127
2

Here is a query that works using our recently released Wikidata SPARQL Query Service endpoint.

PREFIX wd: <http://www.wikidata.org/entity/>
PREFIX wdt: <http://www.wikidata.org/prop/direct/>
PREFIX bd: <http://www.bigdata.com/rdf#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> 

SELECT distinct ?country ?countryLabel ?city ?cityLabel
WHERE
{
  ?city wdt:P31/wdt:P279* wd:Q515;
        wdt:P17 ?country ;
        rdfs:label ?cityLabel . 
  FILTER (lang(?cityLabel) = "en")

  ?country rdfs:label ?countryLabel .
  FILTER (lang(?countryLabel) = "en")

}
ORDER BY ?country

Live Query Results Page.

1

Here is a query that works.

SELECT DISTINCT ?cityID ?cityIDLabel ?countryID ?countryIDLabel WHERE 
{
  {
    SELECT * WHERE 
    {
      ?cityID wdt:P31 ?cityInstance.
      VALUES (?cityInstance) {
        (wd:Q515)
        (wd:Q5119)
      }
      OPTIONAL {
        ?cityID wdt:P17 ?countryID.
        ?countryID p:P31/ps:P31 wd:Q6256.
      }

      FILTER NOT EXISTS {
        ?cityID wdt:P17 "".
        ?countryID wdt:P30 "".
      }
    }
  }
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
}
ORDER BY ?countryIDLabel
user0221441
  • 368
  • 4
  • 11
  • ehm, what is the advantage of your query here? using inline data via `VALUES` would also make the query on the question working. – UninformedUser Aug 25 '19 at 15:59
  • also, what is the purpose of the `FILTER NOT EXISTS`? – UninformedUser Aug 25 '19 at 16:01
  • the advantage is that it doesn't time out due to how the query is structured and orders the results by country. `FILTER NOT EXISTS` filters just does a bit of data cleaning where some empty triples are filtered out (there are some). – user0221441 Aug 25 '19 at 16:26
  • 1
    your query returns `8819` results while there are much more cities in Wikidata according to this query: `SELECT (count(distinct ?city) as ?cnt) WHERE { ?city wdt:P31/wdt:P279* wd:Q515; wdt:P17 ?country . }` which returns `64257` - the reason is obvious, you limited the query to just 2 (sub)classes of city, there are much more like town etc. – UninformedUser Aug 25 '19 at 17:03
  • Also your `FILTER` is literally useless given that `""` denotes the empty string literal and there are neither countries (`wdt:P17`) nor continents (`Wdt:P30`) expressed by a literal given that both are object properties. You can check with `SELECT (count(distinct ?city) as ?cnt) WHERE { ?city wdt:P17 "".}` – UninformedUser Aug 25 '19 at 17:07
  • @joedavid It returns less number of cities. – Karuppiah RK Aug 25 '19 at 23:51
  • 1
    @KaruppiahRK I think you should explicitly list your sub classes as there are many sub-classes that wont be relevant to you. This query returns all the 221 sub-classes of `city`: `SELECT ?subclassLabel WHERE { ?subclass wdt:P279* wd:Q515; SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }}`. As you will see there are many incorrect ones (like municipalities) or many that would be not be relevant in your application like small towns, villages, etc. so you I would suggest to include those you need until the query returns a sufficient number of cities. – user0221441 Aug 26 '19 at 07:37
  • @joedavid Yeah Agreed. But, why some of the `cityIDLabel` name showing like this `Q37819251` instead of city name? – Karuppiah RK Aug 26 '19 at 08:19
  • just like wikipedia, wikidata can be edited too. So I assume someone has input incorrect information. That or, these are entries without labels https://www.wikidata.org/wiki/Q3559100. As you can see in the link someone has left the entry incomplete but is still an instance of city. – user0221441 Aug 26 '19 at 08:35
  • the expression `FILTER NOT EXISTS { ?cityID wdt:P17 "". ?countryID wdt:P30 "". }` is still useless, there are no string literals for those properties ... – UninformedUser Aug 26 '19 at 16:46