How to set timeout with python-mechanize?

Question

I'm using python-mechanize to scrape some web sites, which sometime simply don't respond to requests and these requests stay open too long, so I need to limit timeout for these requests.

While using urlopen method, the timeout can be set using timeout parameter, but I have not found easy way for doing it with high level API such as submit or click methods. Ideally the timeout would be set just once for whole browser class and all calls would honor that.

It would be probably possible to customize this by passing custom request_class to every click and submit call, but this would just pollute the code, so I'm looking for nicer solution for setting timeout for mechanize's browser class (and no, I don't want to change default socket timeout using socket.setdefaulttimeout).

http://stackoverflow.com/questions/8464391/what-should-i-do-if-socket-setdefaulttimeout-is-not-working — Guy Gavriely, Jan 27 '14 at 16:00
I know that mechanize.Request can specify timeout. The problem is that I'm not using Request class directly, but through mechanize's click or submit methods, which don't expose way to set a timeout. — Michal Čihař, Jan 28 '14 at 09:19
`Browser.__init__` takes `request_class` isn't that used as default? — Jan Matějka, Feb 07 '14 at 22:10
Unfortunately this one is not passed to click/submit methods, they have their own hardcoded default. — Michal Čihař, Feb 11 '14 at 13:00

score 2 · Answer 1 · answered May 29 '14 at 15:41

It is slightly frowned upon within the Python community, but you can "duck punch" the desired behaviour into the browser class.

Basically, you need to do the following. Create a function that does what you want (using a custom request class).

browser_click = Browser.click
def my_click(self, *args, **kwds):
    browser_click(self, request_class=MyRequestClass, *args, **kwds)
Browser.click = my_click

If that is too Ruby for your taste, you can create a subclass of Browser that does something similar.

class MyBrowser(Browser):
    def click(self, *args, **kwds):
        Browser.click(self, request_class=MyRequestClass, *args, **kwds)

This I find a bit cleaner, but it will not work in case you have no control the creation of your Browser instances.

score 1 · Answer 2 · edited Jun 02 '14 at 21:21

1

You could try using a do-while loop with code such as:

start = time.clock()
... do something
elapsed = (time.clock() - start)

or

start = time.time()
... do something
elapsed = (time.time() - start)

edited Jun 02 '14 at 21:21

JabberwockyDecompiler

3,318
2
42
54

answered Jun 02 '14 at 14:58

Tim.DeVries

791
2
6
21

How to set timeout with python-mechanize?

2 Answers2