I'm getting the following error (which I assume is because of the forking in my application), "This result object does not return rows".
Traceback
---------
File "/opt/miniconda/envs/analytical-engine/lib/python2.7/site-packages/dask/async.py", line 263, in execute_task
result = _execute_task(task, data)
File "/opt/miniconda/envs/analytical-engine/lib/python2.7/site-packages/dask/async.py", line 245, in _execute_task
return func(*args2)
File "/opt/miniconda/envs/analytical-engine/lib/python2.7/site-packages/smg/analytics/services/impact_analysis.py", line 140, in _do_impact_analysis_mp
Correlation.user_id.in_(user_ids)).all())
File "/opt/miniconda/envs/analytical-engine/lib/python2.7/site-packages/sqlalchemy/orm/query.py", line 2241, in all
return list(self)
File "/opt/miniconda/envs/analytical-engine/lib/python2.7/site-packages/sqlalchemy/orm/loading.py", line 65, in instances
fetch = cursor.fetchall()
File "/opt/miniconda/envs/analytical-engine/lib/python2.7/site-packages/sqlalchemy/engine/result.py", line 752, in fetchall
self.cursor, self.context)
File "/opt/miniconda/envs/analytical-engine/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 1027, in _handle_dbapi_exception
util.reraise(*exc_info)
File "/opt/miniconda/envs/analytical-engine/lib/python2.7/site-packages/sqlalchemy/engine/result.py", line 746, in fetchall
l = self.process_rows(self._fetchall_impl())
File "/opt/miniconda/envs/analytical-engine/lib/python2.7/site-packages/sqlalchemy/engine/result.py", line 715, in _fetchall_impl
self._non_result()
File "/opt/miniconda/envs/analytical-engine/lib/python2.7/site-packages/sqlalchemy/engine/result.py", line 720, in _non_result
"This result object does not return rows. "
I'm using dask and it's multiprocessing scheduler (which uses multiprocessing.Pool
).
As I understand it (based on the documentation), sessions created from a scoped session object (created via scoped_session()
), are threadsafe. This is because they are threadlocal. This would lead me to believe that when I call Session()
(or using the proxy Session
) I'm getting a session object that only exists and is only accessible from the thread it was called from.
This seems pretty straight forward.
What I am confused about, is what is why I'm having issues when forking the process. I understand that you can't re-use an engine across processes, so I've followed the event-based solution verbatim from the docs and done this:
class _DB(object):
_engine = None
@classmethod
def _get_engine(cls, force_new=False):
if cls._engine is None or force_new is True:
cfg = Config.get_config()
user = cfg['USER']
host = cfg['HOST']
password = cfg['PASSWORD']
database = cfg['DATABASE']
engine = create_engine(
'mysql://{}:{}@{}/{}?local_infile=1&'
'unix_socket=/var/run/mysqld/mysqld.sock'.
format(user, password, host, database),
pool_size=5, pool_recycle=3600)
cls._engine = engine
return cls._engine
# From the docs, handles multiprocessing
@event.listens_for(_DB._get_engine(), "connect")
def connect(dbapi_connection, connection_record):
connection_record.info['pid'] = os.getpid()
#From the docs, handles multiprocessing
@event.listens_for(_DB._get_engine(), "checkout")
def checkout(dbapi_connection, connection_record, connection_proxy):
pid = os.getpid()
if connection_record.info['pid'] != pid:
connection_record.connection = connection_proxy.connection = None
raise exc.DisconnectionError(
"Connection record belongs to pid %s, "
"attempting to check out in pid %s" %
(connection_record.info['pid'], pid)
)
# The following is how I create the scoped session object.
Session = scoped_session(sessionmaker(
bind=_DB._get_engine(), autocommit=False, autoflush=False))
Base = declarative_base()
Base.query = Session.query_property()
So my assumptions (based on the docs) are the following:
Using a session object created from a scoped session object, it must always give me a threadlocal session (which in my case would just be the main thread of the child process). Although not in the docs I imagine this should apply even if the scoped session object was created in another process.
The threadlocal session will get a connection from the pool via the engine, if the connection was not created within this process it will create a new one (based on the above
connection()
andcheckout()
implementations.)
If both of these things were true, then everything should "just work" (AFAICT). That's not the case though.
I managed to get it to work by creating a new scoped session object in each new process, and using it in all subsequent calls using a session.
BTW the Base.query
attribute needed to be updated from this new scoped session object as well.
I imagine that my #1 assumption above is incorrect. Can anyone help me understand why I need to create a new scoped session object in each process?
Cheers.