4

I have a python script that sets up several gearman workers. They call into some methods on SQLAlchemy models I have that are also used by a Pylons app.

Everything works fine for an hour or two, then the MySQL thread gets lost and all queries fail. I cannot figure out why the thread is getting lost (I get the same results on 3 different servers) when I am defining such a low value for pool_recycle. Also, why wouldn't a new connection be created?

Any ideas of things to investigate?

import gearman
import json
import ConfigParser
import sys
from sqlalchemy import create_engine

class JSONDataEncoder(gearman.DataEncoder):
    @classmethod
    def encode(cls, encodable_object):
        return json.dumps(encodable_object)
    @classmethod
    def decode(cls, decodable_string):
        return json.loads(decodable_string)

# get the ini path and load the gearman server ips:ports
try:
    ini_file = sys.argv[1]
    lib_path = sys.argv[2]
except Exception:
    raise Exception("ini file path or anypy lib path not set")

# get the config
config = ConfigParser.ConfigParser()
config.read(ini_file)
sqlachemy_url =  config.get('app:main', 'sqlalchemy.url')
gearman_servers =  config.get('app:main', 'gearman.mysql_servers').split(",")

# add anypy include path
sys.path.append(lib_path)
from mypylonsapp.model.user import User, init_model
from mypylonsapp.model.gearman import task_rates

# sqlalchemy setup, recycle connection every hour
engine = create_engine(sqlachemy_url, pool_recycle=3600)
init_model(engine)

# Gearman Worker Setup
gm_worker = gearman.GearmanWorker(gearman_servers)
gm_worker.data_encoder = JSONDataEncoder()

# register the workers
gm_worker.register_task('login', User.login_gearman_worker)
gm_worker.register_task('rates', task_rates)

# work
gm_worker.work()
ocodo
  • 29,401
  • 18
  • 105
  • 117
Tony
  • 2,037
  • 3
  • 22
  • 22
  • 1
    Another thought, are you freeing up the sqlalchemy sessions between tasks or just re-using them? – David Feb 03 '11 at 21:03
  • @David - no I was not freeing them up, reusing them. Did more digging and found "Can't reconnect until invalid transaction is rolled back" in the logs. So I tried doing a session.rollback() after each job, but that conflicts with the functionality of my one job, which caches the sqlalchemy response object and reuses it on subsequent jobs for big speedups. So what I am trying now is using NullPool - engine = create_engine(sqlachemy_url, poolclass=NullPool) and after each job, calling session.close(). Will know after 24 hours if that is the winning combination. – Tony Feb 04 '11 at 16:45
  • Using the Nullpool may work ( honestly can't say one way or the other ) but if it doesn't, perhaps a deep clone/copy of the response object will work out ( separating your caching from the DB api ). – David Feb 04 '11 at 21:28
  • So could you solve this issue, @Tony? – fedorqui Oct 22 '13 at 13:48
  • @fedorqui It's been a while, but as I recall the solution was to close the session after each request. That means wrapping things with try/catch so you can close the session even if something errors out before a commit() happens. – Tony Oct 22 '13 at 18:16
  • @Tony thanks a lot! I was facing the same issue yesterday, so I will check if this helps in my case. I also suggest you to post an answer so future visitors can learn from your investigations :) – fedorqui Oct 23 '13 at 08:37
  • @fedorqui This is what I do that ensures there are no hanging connections: tx1 = Session() try: tx1.add(fo) tx1.commit() except Exception: tx1.rollback() raise # re-raise db error – Tony Oct 23 '13 at 13:08
  • 1
    Great! Very kind of you. Maybe you'd better post it as an answer so the try/except format is seen properly :) – fedorqui Oct 23 '13 at 13:10

1 Answers1

3

I've seen this across the board for Ruby, PHP, and Python regardless of DB library used. I couldn't find how to fix this the "right" way which is to use mysql_ping, but there is a SQLAlchemy solution as explained better here http://groups.google.com/group/sqlalchemy/browse_thread/thread/9412808e695168ea/c31f5c967c135be0

As someone in that thread points out, setting the recycle option to equal True is equivalent to setting it to 1. A better solution might be to find your MySQL connection timeout value and set the recycle threshold to 80% of it.

You can get that value from a live set by looking up this variable http://dev.mysql.com/doc/refman/5.0/en/server-system-variables.html#sysvar_connect_timeout

Edit: Took me a bit to find the authoritivie documentation on useing pool_recycle http://www.sqlalchemy.org/docs/05/reference/sqlalchemy/connections.html?highlight=pool_recycle

David
  • 17,673
  • 10
  • 68
  • 97