I was trying to run some wikidata queries with Python requests and multiprocessing (number_workers = 8) and now I'm getting code 403 (Access Forbiden). Are there any restrictions? I've seen here that I should limit myself to 5 concurrent queries, but even with one query I don't get any result now through Python. It used to work.
Is this Access Forbiden temporary or am I blacklisted forever? :(
I didn't see any restrictions in their doc, so I was not aware that I'm doing something that will get me banned.
Does anyone know what the situation is?
wikidata_url = 'https://query.wikidata.org/sparql'
headers = {'User-Agent': 'Chrome/77.0.3865.90'}
r = requests.get(wikidata_url, params={'format': 'json', 'query': query, 'headers': headers})
EDIT AFTER FIXES: It turned out that I was temporarily banned from the server. I have changed my user agent to follow the recommended template and I waited for my bad to be removed. The problem was that I was ignoring the error 429 that tells me that I have exceeded my allowed limit and I have to retry after some time (some seconds). This leaded to my error 403.
I tried to correct my error caused by inexperience by writing the following code that takes this into account. I added this edit because it may be useful for someone else.
def get_delay(date):
try:
date = datetime.datetime.strptime(date, '%a, %d %b %Y %H:%M:%S GMT')
timeout = int((date - datetime.datetime.now()).total_seconds())
except ValueError:
timeout = int(date)
return timeout
def make_request(params):
r = requests.get(wikidata_url, params)
print(r.status_code)
if r.status_code == 200:
if r.json()['results']['bindings']:
return r.json()
else:
return None
if r.status_code == 500:
return None
if r.status_code == 403:
return None
if r.status_code == 429:
timeout = get_delay(r.headers['retry-after'])
print('Timeout {} m {} s'.format(timeout // 60, timeout % 60))
time.sleep(timeout)
make_request(params)