I am have a small crawler and I am extracting a web-page content of a simple page.
def url2dict(url):
'''
DOCSTRING: converts two-column data into a dictionary with first column as a key.
INPUT: URL address as a string
OUTPUT: dictionary with one key and one value
'''
with urlopen(url) as page:
page_raw = page.read()
...
Now this function calls the server at url. The problem is the server has generated 504 Error
File "C:\Python38\lib\urllib\request.py", line 222, in urlopen
return opener.open(url, data, timeout)
File "C:\Python38\lib\urllib\request.py", line 531, in open
response = meth(req, response)
File "C:\Python38\lib\urllib\request.py", line 640, in http_response
response = self.parent.error(
File "C:\Python38\lib\urllib\request.py", line 569, in error
return self._call_chain(*args)
File "C:\Python38\lib\urllib\request.py", line 502, in _call_chain
result = func(*args)
File "C:\Python38\lib\urllib\request.py", line 649, in http_error_default
raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 504: Gateway Time-out
My problem is I cannot find what is the default value of urlopen timeout.
Here https://bugs.python.org/issue18417 it is said that there is no timeout (timeout = None) by default (at least for Python 3.4 version):
OK, I reviewed the issue enough to remember: If socket.setdefaulttimeout is never called, then the default timeout is None (no timeout).
What is the current state for 3.8?
If there is no timeout set, why I got this error of Error 504?
More details:
One of the errors shows error in
File "C:\Python38\lib\urllib\request.py", line 222, in urlopen
return opener.open(url, data, timeout)
I open the file and I have read:
def urlopen(url, data=None, timeout=socket._GLOBAL_DEFAULT_TIMEOUT, *, cafile=None, capath=None, cadefault=False, context=None): '''Open the URL url, which can be either a string or a Request object.
*data* must be an object specifying additional data to be sent to
the server, or None if no such data is needed. See Request for
details.
urllib.request module uses HTTP/1.1 and includes a "Connection:close"
header in its HTTP requests.
The optional *timeout* parameter specifies a timeout in seconds for
blocking operations like the connection attempt (if not specified, the
global default timeout setting will be used). This only works for HTTP,
HTTPS and FTP connections.
So does (if not specified, the global default timeout setting will be used) mean that if I have a global variable defined called timeout, it would be used as a timeout duration?