1

I am developing a Wikipedia bot to analyze editing contributions. Unfortunately, it takes hours to complete a single run and during that time Wikipedia's database replication delay—at some point during the run—is sure to exceed 5 seconds (the default maxlag value). The recommendation in the API's maxlag parameter is to detect the lag error, pause for X seconds and retry.

But all I am doing is reading contributions with:

usrpg = pywikibot.Page(site, 'User:' + username)
usr = pywikibot.User(usrpg)
for contrib in usr.contributions(total=max_per_user_contribs):
    # (analyzes contrib here)

How to detect the error and resume it? This is the error:

WARNING: API error maxlag: Waiting for 10.64.32.21: 7.1454429626465 seconds lagged
Traceback (most recent call last):
  File ".../bot/core/pwb.py", line 256, in <module>
    if not main():
  File ".../bot/core/pwb.py", line 250, in main
    run_python_file(filename, [filename] + args, argvu, file_package)
  File ".../bot/core/pwb.py", line 121, in run_python_file
    main_mod.__dict__)
  File "analyze_activity.py", line 230, in <module>
    attrs = usr.getprops()
  File ".../bot/core/pywikibot/page.py", line 2913, in getprops
    self._userprops = list(self.site.users([self.username, ]))[0]
  File ".../bot/core/pywikibot/data/api.py", line 2739, in __iter__
    self.data = self.request.submit()
  File ".../bot/core/pywikibot/data/api.py", line 2183, in submit
    raise APIError(**result['error'])
pywikibot.data.api.APIError: maxlag: Waiting for 10.64.32.21:
    7.1454 seconds lagged [help:See https://en.wikipedia.org/w/api.php for API usage]
<class 'pywikibot.data.api.APIError'>
CRITICAL: Closing network session.

It occurs to me to catch the exception thrown in that line of code:

 raise APIError(**result['error'])

But then restarting the contributions for the user seems terribly inefficient. Some users have 400,000 edits, so rerunning that from the beginning is a lot of backsliding.

I have googled for examples of doing this (detecting the error and retrying) but I found nothing useful.

wallyk
  • 56,922
  • 16
  • 83
  • 148
  • The error in the traceback is on `attrs = usr.getprops()` while your code does not have that line (Seems to be some mismatch). Is the exception being throws inside the loop over `usr.contributions()` or before the loop is being run ? – AbdealiLoKo Aug 18 '16 at 04:46
  • @AJK: It appears to be inside the *contributions* loop. – wallyk Aug 18 '16 at 17:26
  • In that case iif you catch and retry that code, it should be fine right ? As the whole contribution list does not need to be pulled again. I'd suggest https://pypi.python.org/pypi/retry which is what I use in my scripts – AbdealiLoKo Aug 19 '16 at 09:48
  • It seems, setting the configs `maxlag = 20` and `retry_wait = 20` and `max_retries = 8` has fixed this for me. Using this, pywiibot retries the API call more, and is more persistent to get the output. For me, The error hasn't been throws even once after the change. – AbdealiLoKo Aug 20 '16 at 03:45
  • 1
    @AJK: I had never looked in `user-config.py` in any detail before. It looks like there are all kinds of useful buttons and knobs there! – wallyk Aug 20 '16 at 04:40

1 Answers1

2

Converting the previous conversation in comments into an answer.

One possible method to resolve this is to try/catch the error and redo the piece of code which caused the error.

But, pywikibot already does this internally for us ! Pywikibot, by default tries to retry every failed API call 2 times if you're using the default user-config.py it generates. I found that increasing the following configs does the trick in my case:

  • maxlag = 20
  • retry_wait = 20
  • max_retries = 8

The maxlag is the parameter recommended to increase according to the documentation of Maxlag parameter, especially if you're doing a large number of writes in a short span of time. But, the retry_wait and max_retries configs are useful in case someone else is writing a lot (As is my case: My scripts just read from wiki).

AbdealiLoKo
  • 3,261
  • 2
  • 20
  • 36
  • Yes, my script is almost all reads. Out of about 450,000 reads, there is only one write: the summary of the analysis. – wallyk Aug 21 '16 at 16:20
  • I was blocked for using maxlag higher than the recommended 5, so be aware of that. – Asaf M Mar 12 '20 at 15:33