1

I am iterating through a list users of approximately 1000 entries, like so:

def wikidata_user_lookup(id_str):
    q = f'''
        SELECT ?item ?itemLabel ?kind ?kindLabel
        WHERE 
        {{
            ?item p:P2002 ?twitter .
            ?item wdt:P31 ?kind .
            ?twitter pq:P6552 "{id_str}" .
            SERVICE wikibase:label {{ bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }}
        }}
    '''
    sparql.setQuery(q)
    sparql.setReturnFormat(JSON)
    results = sparql.query().convert()
    results_df = pd.io.json.json_normalize(results['results']['bindings'])
    return results_df

for user in users:
    res = wikidata_user_lookup(user)

So I am submitting ~1000 queries, one after another. As far as I can tell, I'm not running multiple queries in parallel, so shouldn't this be allowed? I am getting HTTPError: HTTP Error 429: Too Many Requests. What's the correct way to deal with this situation?

theQman
  • 1,690
  • 6
  • 29
  • 52

1 Answers1

7

I suggest to read the Query Limits official documentation.

It states you can run:

  • One client (user agent + IP) is allowed 60 seconds of processing time each 60 seconds
  • One client is allowed 30 error queries per minute

Therefore your 1000 queries must be at most 30 in a minute to avoid the limit, and use less than 60 seconds processing time.

Since you're getting the 429 error, as per the documentation linked above you should check the Retry-After header and wait for the time specified before making more queries.

Hitobat
  • 2,847
  • 1
  • 16
  • 12
  • I can't find any documentation on how to retrieve the `Retry-After` header using SPARQLWrapper. Do you know how? Also, couldn't I unravel my 1000 ids into a single query where I `UNION` all the individual queries into one large query? Thus bypassing the limitations? – theQman Jun 15 '20 at 21:16
  • 1
    You could think about using the VALUES keyword. Change the argument of wikidata_user_lookup from a user to a list of users. Then, let "{id_str}" be assigned to a variable ?user, and then add this line: VALUES ?user {{list_of_users}} You may need to convert the list to a string first, and get rid of any commas between the items. Having said this, the query may be slow, but worth a try. – Valerio Cocchi Jun 15 '20 at 21:26
  • The syntax for VALUES is this: VALUES ?user { ... } – Valerio Cocchi Jun 15 '20 at 21:28
  • Thanks! I think I can get this to work. Are there any guidelines as to how many users I can pack into that list in one shot? 1000 gives me a server hangup, so how large is too large? – theQman Jun 16 '20 at 01:17
  • No idea I'm afraid... I think it will depend on the triplestore. Wikidata uses Blazegraph I think. – Valerio Cocchi Jun 16 '20 at 17:40
  • I'm trying to search for labels from a list instead of ids and it basically works the way you suggest: VALUES ?item { "deep learning"@en "optimization"@en ....} . Now I want to read them from a list, similar to ?user {{list_of_users}}. How can I modify them with @en? Thanks! – Stoyan Dimitrov Oct 04 '20 at 09:05
  • There is a PR handling 429 error opened since 2019... https://github.com/RDFLib/sparqlwrapper/pull/140 Looks good to me though. – menrfa Jan 22 '22 at 11:23