3

I was trying to run some wikidata queries with Python requests and multiprocessing (number_workers = 8) and now I'm getting code 403 (Access Forbiden). Are there any restrictions? I've seen here that I should limit myself to 5 concurrent queries, but even with one query I don't get any result now through Python. It used to work.

Is this Access Forbiden temporary or am I blacklisted forever? :(

I didn't see any restrictions in their doc, so I was not aware that I'm doing something that will get me banned.

Does anyone know what the situation is?

wikidata_url = 'https://query.wikidata.org/sparql'
headers = {'User-Agent': 'Chrome/77.0.3865.90'}
r = requests.get(wikidata_url, params={'format': 'json', 'query': query, 'headers': headers})

EDIT AFTER FIXES: It turned out that I was temporarily banned from the server. I have changed my user agent to follow the recommended template and I waited for my bad to be removed. The problem was that I was ignoring the error 429 that tells me that I have exceeded my allowed limit and I have to retry after some time (some seconds). This leaded to my error 403.

I tried to correct my error caused by inexperience by writing the following code that takes this into account. I added this edit because it may be useful for someone else.

def get_delay(date):
    try:
        date = datetime.datetime.strptime(date, '%a, %d %b %Y %H:%M:%S GMT')
        timeout = int((date - datetime.datetime.now()).total_seconds())
    except ValueError:
        timeout = int(date)
    return timeout


def make_request(params):
    r = requests.get(wikidata_url, params)
    print(r.status_code)
    if r.status_code == 200:
        if r.json()['results']['bindings']:
            return r.json()
        else:
            return None
    if r.status_code == 500:
        return None
    if r.status_code == 403:
        return None
    if r.status_code == 429:
        timeout = get_delay(r.headers['retry-after'])
        print('Timeout {} m {} s'.format(timeout // 60, timeout % 60))
        time.sleep(timeout)
        make_request(params)
  • I am new to wikidata and wanted to use it with Python for the first time today. But I also have this '' error. So I think it is not related to you nor to multiprocessing. So maybe you should edit the title. – Krystof May 14 '20 at 17:45
  • It turned out that it was related to me. I was temporarily banned. – Arhiliuc Cristina May 15 '20 at 18:08

1 Answers1

4

The access limits were tightened up in 2019 to try and cope with overloading of the query servers. The generic python-request user agent was blocked as part of this (I don't know if/when this was reinstated).

Per the Query Service manual, the current rules seem to be:

  • One client (user agent + IP) is allowed 60 seconds of processing time each 60 seconds
  • One client is allowed 30 error queries per minute
  • Clients who don't comply with the User-Agent policy may be blocked completely
  • Access to the service is limited to 5 parallel queries per IP [this may change]

I would recommend trying again, running single queries with a more detailed user-agent, to see if that works.

Andrew is gone
  • 286
  • 1
  • 5