7

I am using the sqlalchemy package in python. I have an operation that takes some time to execute after I perform an autoload on an existing table. This causes the following error when I attempt to use the connection:

sqlalchemy.exc.OperationalError: (OperationalError) (2006, 'MySQL server has gone away')

I have a simple utility function that performs an insert many:

def insert_data(data_2_insert, table_name):
    engine = create_engine('mysql://blah:blah123@localhost/dbname')
    # Metadata is a Table catalog. 
    metadata = MetaData()
    table = Table(table_name, metadata, autoload=True, autoload_with=engine)
    for c in mytable.c:
        print c
    column_names = tuple(c.name for c in mytable.c)
    final_data = [dict(zip(column_names, x)) for x in data_2_insert]
    ins = mytable.insert()
    conn = engine.connect()
    conn.execute(ins, final_data)
    conn.close()

It is the following line that times long time to execute since 'data_2_insert' has 677,161 rows.

final_data = [dict(zip(column_names, x)) for x in data_2_insert]

I came across this question which refers to a similar problem. However I am not sure how to implement the connection management suggested by the accepted answer because robots.jpg pointed this out in a comment:

Note for SQLAlchemy 0.7 - PoolListener is deprecated, but the same solution can be implemented using the new event system.

If someone can please show me a couple of pointers on how I could go about integrating the suggestions into the way I use sqlalchemy I would be very appreciative. Thank you.

Community
  • 1
  • 1
codingknob
  • 11,108
  • 25
  • 89
  • 126

2 Answers2

13

I think you are looking for something like this:

from sqlalchemy import exc, event
from sqlalchemy.pool import Pool

@event.listens_for(Pool, "checkout")
def check_connection(dbapi_con, con_record, con_proxy):
    '''Listener for Pool checkout events that pings every connection before using.
    Implements pessimistic disconnect handling strategy. See also:
    http://docs.sqlalchemy.org/en/rel_0_8/core/pooling.html#disconnect-handling-pessimistic'''

    cursor = dbapi_con.cursor()
    try:
        cursor.execute("SELECT 1")  # could also be dbapi_con.ping(),
                                    # not sure what is better
    except exc.OperationalError, ex:
        if ex.args[0] in (2006,   # MySQL server has gone away
                          2013,   # Lost connection to MySQL server during query
                          2055):  # Lost connection to MySQL server at '%s', system error: %d
            # caught by pool, which will retry with a new connection
            raise exc.DisconnectionError()
        else:
            raise

If you wish to trigger this strategy conditionally, you should avoid use of decorator here and instead register listener using listen() function:

# somewhere during app initialization
if config.check_connection_on_checkout:
    event.listen(Pool, "checkout", check_connection)

More info:

Palasaty
  • 5,181
  • 1
  • 26
  • 22
  • Thank you for the solution example. My question is where do I place this code? Can I put it in my script? I'm not quite sure how to integrate it into my utility function shown in my post. Where do I call check_connection()? I appreciate the help. – codingknob Apr 02 '13 at 17:08
  • To clarify my environment. I have a VM (Linux Redhat) which has MySQL server running on it. My script runs locally on the same machine. It is a simple script that is doing a insert many after processing/parsing a csv file. I don't have any web apps or anything of that nature. – codingknob Apr 02 '13 at 17:25
  • Yes, you put `check_connection()` function inside your script. `event.listens_for` decorator should register event handler automatically, so no changes to `insert_data()` is necessary. You can add logging to `check_connection()` just to be sure it is called. – Palasaty Apr 02 '13 at 22:48
  • By the way, can you measure how much time it takes to generate `final_data` list? – Palasaty Apr 02 '13 at 22:50
  • It takes 54 seconds, which isn't bad actually. The connection should remain open. I think the problem is final_data is too big. I have 650,000 rows with 10 columns. I don't think the insert statement can handle that. I am instead going to write final_data to a csv file and then populate the database using load table inline from a file. That said, I still need to use this script to keep the database current when new data becomes available at the end of each day. The size of final_data for daily data is small (~500 rows max). So that this should work. I will implement your solution to be safe. – codingknob Apr 04 '13 at 17:31
1

There is a better way to handle it right now - pool_recycle

engine = create_engine('mysql://...', pool_recycle=3600)

MySQL has a default timeout of 8 hours. This leads to the connection to be closed by MySQL but the engine above it (such as SQLAlchemy) to not know about it.

There are 2 ways to solve it -

  1. Optimistic - Using pool_recycle
  2. Pessimistic - using pool_pre_ping=True

I prefer to go with the pool_recycle as it doesn't emit a SELECT 1 before each query - causing less stress on the db

Gal Bracha
  • 19,004
  • 11
  • 72
  • 86
  • when you restart `mysqld` in the background while your tornado app is working - neither of the two work and resulting the `lost connection` error ... and I can reproduce it anytime ( w/o waiting 8 hours for mysql to say `gone away`... ) – Ricky Levi Mar 26 '23 at 09:57
  • Just wanted to comment for others, if you're using `tornado-sqlalchemy` make sure to place these configuration under `db=SQLAlchemy(url=..., engine_options={ "pool_pre_ping": True })` – Ricky Levi Mar 26 '23 at 11:28