I'm trying to programmatically retrieve editing history pages from the MusicBrainz website. (musicbrainzngs
is a library for the MB web service, and editing history is not accessible from the web service). For this, I need to login to the MB website using my username and password.
I've tried using the mechanize
module, and using the login page second form (first one is the search form), I submit my username and password; from the response, it seems that I successfully login to the site; however, a further request to an editing history page raises an exception:
mechanize._response.httperror_seek_wrapper: HTTP Error 403: request disallowed by robots.txt
I understand the exception and the reason for it. I take full responsibility for not abusing the site (after all, any usage will be tagged with my username), I just want to avoid manually opening a page, saving the HTML and running a script on the saved HTML. Can I overcome the 403 error?