1

I could not find any help for my problem that I encounter when I am trying to get list of all basketball players from Wikidata. First I get the number of players (it is someting around 130k). Then I am creating query with specific offset and limit 2000. The problem is that I am getting the same 2000 players every time whatever the offset is.

(However, if I am on https://query.wikidata.org/ than the results are always different)

Here is part of my code in python, where query is created.

while(numberOfPlayers > 0):
    numberOfPlayers-=2000
    offset = 0
    queryPlayersBlock = """SELECT ?item ?itemLabel 
            WHERE 
            {
                ?item wdt:P106 wd:Q3665646.
                SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
            }
            offset """+str(offset)+"""
            limit 2000
            """
    players = get_results(endpoint_url,queryPlayersBlock)["results"]["bindings"]
    for i in range (0,len(players)):
        dataFile.write(str(players[i]["itemLabel"]["value"]+" : "+players[i]["item"]["value"].removeprefix("http://www.wikidata.org/entity/")+"\n"))
    offset+=2000

I found on sparql documentation that : "Using LIMIT and OFFSET to select different subsets of the query solutions will not be useful unless the order is made predictable by using ORDER BY." But when I use order by I get error "Query timeout limit reached".

logi-kal
  • 7,107
  • 6
  • 31
  • 43
Riomare
  • 65
  • 1
  • 10

0 Answers0