Is there any way to put timeout in pandas read_sql function?

Question

I connect to a DB2 server through ODBC connection in my python code. The DB2 server gets reboot for maintainence or disconnects me while running specific server side tasks, happens 1 or 2 times in a day. At that time if my code has started executing the pandas read_sql function to fetch result of query, it goes into a infinite wait even when the server is up after lets say 1 hour.

I want to put a timeout in the execution of read_sql and whenever that timeout occurs I want to refresh the connection with DB2 server so that a fresh connection is made again before continuing the query.

I have tried making a while loop and picking chunks of data from DB2 instead of pulling whole result at once, but problem is if DB2 disconnects in pulling chunk python code still goes into infinite wait.

chunk_size = 1000    
offset = 0
while True:
        sql = "SELECT * FROM table_name limit %d offset %d" % (chunk_size,offset)
        df = pd.read_sql(sql, conn)
        df.index += (offset+1)
        offset += chunk_size
        sys.stdout.write('.')
        sys.stdout.flush()
        if df.shape[0] < chunk_size:
            break

I need the read_sql to throw some exception or return a value if the sql execution takes more than 3 minutes. If that happenes I need the connection to DB2 to refresh.

@ChaminWickramarathna I am not sure if this is the best way, but the solution marked below for this problem has been handy to me in many other situations as well, including this one. — Sumit Singh, Nov 17 '20 at 17:19
What is the default internal timeout for pandas read_sql_query? if anyone know? My target is an optional parameter for the internal timeout to override if provided otherwise remain it as default. — Shubhank Gupta, Feb 16 '22 at 07:34

score 10 · Accepted Answer · answered Sep 05 '19 at 08:58

10

You could use the package func-timeout. You can install via pip as below:

pip install func-timeout

So, for example, if you have a function “doit(‘arg1’, ‘arg2’)” that you want to limit to running for 5 seconds, with func_timeout you can call it like this:

from func_timeout import func_timeout, FunctionTimedOut

try:
  doitReturnValue = func_timeout(5, doit, args=(‘arg1’, ‘arg2’))
except FunctionTimedOut:
  print ( “doit(‘arg1’, ‘arg2’) could not complete within 5 seconds, hence terminated.\n”)
except Exception as e:
  # Handle any exceptions that doit might raise here

answered Sep 05 '19 at 08:58

Ashan Priyadarshana

3,119
3
29
34

Not something I expected but it does work another way. So thanks. – Sumit Singh Sep 06 '19 at 05:04
Can we have an optional parameter here for the timeout to function? e.g read_sql_query can take an optional argument if provided otherwise run with default internal timeout value of that function. – Shubhank Gupta Feb 16 '22 at 07:36

score 0 · Answer 2 · answered Aug 15 '23 at 08:11

Theoretically you could set a read timeout on your database backend. If you're using sqlalchemy with pymysql for example, it should be:

from sqlalchemy import create_engine
eng = create_engine("mysql+pymysql://username:password@host:port/database",
    connect_args={'connect_timeout': 10.0, 'read_timeout': 180.0})

However the read_timeout doesn't work in my case. Not sure if that's an issue with pymysql. If you have a different backend, give it a try by looking up the connector's documentation and pass the corresponding keyword arg to the connect_args parameter (maybe also read_timeout)

ascripter · Answer 3 · 2023-08-16T05:45:30.700

Here is a generic thread based timeout decorator, similar to func-timeout mentioned by @ashan-priyadarshana, but w/o requiring an external package:

import threading, functools

class TimeoutException(Exception):
    pass


def timeout(seconds: float):
    def decorator(func):
        """Start `func` in another thread and raise TimeoutException when it isn't done
        within `seconds`. Function is actually executed in another thread. Therefore
        we need to hand over the function's return value in a special keyword argument
        named `__timeout_return_value__`
        """
        @functools.wraps(func)
        def wrapper(*args, **kwargs):
            def func2(*args2, **kwargs2):
                kwargs["__timeout_return_value__"] = func(*args2, **kwargs2)
            thrd = threading.Thread(target=func2, args=args, kwargs=kwargs)
            thrd.start()
            thrd.join(timeout=seconds)
            if thrd.is_alive():
                raise TimeoutException(f"{func} didn't finish after {seconds:.1f} s")
            return kwargs["__timeout_return_value__"]

        return wrapper

    return decorator

# Usage example

import time

@timeout(1.5)
def test(a, b, c):
    time.sleep(1)
    print(a)
    time.sleep(1)
    print(b)
    time.sleep(1)
    print(c)
    return a + b + c

x = test(1, 2, 3)


>> 1
>> TimeoutException: <function test at 0x00000249C35D2E50> didn't finish after 1.5 s
>> 2
>> 3

x will only be defined if the function finishes before timeout. Also note that the thread isn't killed (python doesn't support stoppable threads) and continues printing to stdout. That might be a drawback when applying this scheme to an sql-connection which will still try to execute the query in the background.

Applied to pandas.read_sql it would be most handy to wrap the function directly, i.e.

timeout(180)(pd.read_sql)(sql, conn)

Is there any way to put timeout in pandas read_sql function?

3 Answers3