This question is about the timeout
parameter in the result
method of QueryJob objects in the BigQuery Python client.
It looks like the meaning of timeout
has changed in relation to version 1.24.0.
For example, the documentation for QueryJob's result
in version 1.24.0 states that timeout is:
The number of seconds to wait for the underlying HTTP transport before using retry. If multiple requests are made under the hood, timeout is interpreted as the approximate total time of all requests.
As I understand it, this could be used as a way to limit the total time that the result
method call will wait for the results.
For example, consider the following script:
import logging
from google.cloud import bigquery
# Set logging level to DEBUG in order to see the HTTP requests
# being made by urllib3
logging.basicConfig(level=logging.DEBUG)
PROJECT_ID = "project_id" # replace by actual project ID
client = bigquery.Client(project=PROJECT_ID)
QUERY = ('SELECT name FROM `bigquery-public-data.usa_names.usa_1910_2013` '
'WHERE state = "TX" '
'LIMIT 100')
TIMEOUT = 30 # in seconds
query_job = client.query(QUERY) # API request - starts the query
assert query_job.state == 'RUNNING'
# Waits for the query to finish
iterator = query_job.result(timeout=TIMEOUT)
rows = list(iterator)
assert query_job.state == 'DONE'
As I understand it, if all the API calls involved in fetching the results added up to more than 30 seconds, the call to result
would give up. So, timeout
here serves to limit the total execution time of the result
method call.
However, later versions introduced a change. For example, the documentation for result
in 1.27.2 states that timeout is:
The number of seconds to wait for the underlying HTTP transport before using retry. If multiple requests are made under the hood, timeout applies to each individual request.
If I'm understanding this correctly, the example above changes meaning completely, and the call to result
could potentially take more than 30 seconds.
My doubts are:
- What exactly is the difference of the script above if I run it with the new version of
result
versus the old version? - What are the currently recommended use cases for passing a
timeout
value toresult
? - What is the currently recommended way to time out after a given total time while waiting for query results?
Thank you.