5

I'm aware that urllib2 is available on Google App Engine as a wrapper of Urlfetch and, as you know, Universal Feedparser uses urllib2.

Do you know any method to set a timeout on urllib2?
Is the timeout parameter on urllib2 been ported on Google App Engine version?

I'm not interested in method like:

rssurldata = urlfetch(rssurl, deadline=..)
feedparser.parse(rssurldata)
systempuntoout
  • 71,966
  • 47
  • 171
  • 241
  • Is there a specific reason you don't want to use the simpler method you just outlined? – Nick Johnson Jul 27 '10 at 08:47
  • @Nick Hi :)!Uhm, simply because the feed crawling library i'm working with is kinda sealed and should stay gae agnostic. Could you point me to the urllib2 wrapper on gae source code? I also do not know if the current urllib2 timeout is set to 5 seconds or maxed to 10 seconds (MAX Urlfetch Deadline) – systempuntoout Jul 27 '10 at 09:46
  • The wrapper leaves the timeout at the default of 5 seconds. I'm not aware of any way to pass a timeout value through the wrapper to the urlfetch API. On the hackish end of things, though, you could always monkeypatch the urlfetch API to default to 10 seconds... – Nick Johnson Jul 27 '10 at 10:12

4 Answers4

3

There's no simple way to do this, as the wrapper doesn't provide a way to pass through the timeout value, to the best of my knowledge. One hackish option would be to monkeypatch the urlfetch API:

old_fetch = urlfetch.fetch
def new_fetch(url, payload=None, method=GET, headers={},
          allow_truncated=False, follow_redirects=True,
          deadline=10.0, *args, **kwargs):
  return old_fetch(url, payload, method, headers, allow_truncated,
                   follow_redirects, deadline, *args, **kwargs)
urlfetch.fetch = new_fetch
Nick Johnson
  • 100,655
  • 16
  • 128
  • 198
1

I prefer this. It's more dynamic for GAE API updates.

# -*- coding: utf-8 -*-
from google.appengine.api import urlfetch

import settings


def fetch(*args, **kwargs):
    """
    Base fetch func with default deadline settings
    """
    fetch_kwargs = {
        'deadline': settings.URL_FETCH_DEADLINE
    }
    fetch_kwargs.update(kwargs)
    return urlfetch.fetch(
        *args, **fetch_kwargs
    )
tvavrys
  • 13
  • 2
0

You can set the default deadline which is the preferred way:

from google.appengine.api import urlfetch
import urllib, urllib2


class MyClass():

    def __init__(self):
        urlfetch.set_default_fetch_deadline(10)

I have an opener I use of the urllib2 for enabling the CookieJar, but you can then just do simple requests

response = self.opener.open(self.url_login, data_encoded)

You can easily see the effect if you set the deadline to 0.1

Tjorriemorrie
  • 16,818
  • 20
  • 89
  • 131
-3

Have you tried setting the socket timeout value? Taken from here:

As of Python 2.3 you can specify how long a socket should wait for a response before timing out. This can be useful in applications which have to fetch web pages. By default the socket module has no timeout and can hang. Currently, the socket timeout is not exposed at the httplib or urllib2 levels. However, you can set the default timeout globally for all sockets using :

import socket
import urllib2

# timeout in seconds
timeout = 10
socket.setdefaulttimeout(timeout)

# this call to urllib2.urlopen now uses the default timeout
# we have set in the socket module
req = urllib2.Request('http://www.voidspace.org.uk')
response = urllib2.urlopen(req)

I'm not sure if GAE reads this value, but it's worth a shot!

Edit:

urllib2 has the ability to pass a timeout parameter:

The optional timeout parameter specifies a timeout in seconds for blocking operations like the connection attempt (if not specified, the global default timeout setting will be used). This actually only works for HTTP, HTTPS, FTP and FTPS connections.connections.

advait
  • 6,355
  • 4
  • 29
  • 39