When using an SQLAlchemy engine for GeoPandas read_postgis()
method to read data from a PostgreSQL/PostGIS database, there are plenty of busy database slots which stay open for nothing, often showing simple COMMIT
or ROLLBACK
query statements.
Here's a test.py
code snippet which represents this problem:
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
import os
os.environ['USE_PYGEOS'] = '0'
import geopandas as gpd
from sqlalchemy import create_engine, text
def foo():
engine = create_engine("postgresql://postgres:password@db:5432/postgres")
with engine.begin() as connection:
gdf0 = gpd.read_postgis(
sql = text("WITH cte AS (SELECT * FROM public.cities) SELECT *, ST_Buffer(geom,0.0001) FROM public.cities WHERE name ILIKE 'bog';"),
con=connection,
geom_col='geom',
crs=4326,
)
gdf1 = gpd.read_postgis(
sql = text("WITH cte AS (SELECT * FROM public.cities) SELECT *, ST_Buffer(geom,0.0002) FROM public.cities WHERE name ILIKE 'bra';"),
con=connection,
geom_col='geom',
crs=4326,
)
gdf2 = gpd.read_postgis(
sql = text("WITH cte AS (SELECT * FROM public.cities) SELECT *, ST_Buffer(geom,0.0003) FROM public.cities WHERE country ILIKE 'ven';"),
con=connection,
geom_col='geom',
crs=4326,
)
i=-1
while i < 100:
i+=1
foo()
As you can see, I'm using engine.begin()
, which is known as "Connect and Begin Once" in the SQLAlchemy documentation:
A convenient shorthand form for the above “begin once” block is to use the Engine.begin() method at the level of the originating Engine object, rather than performing the two separate steps of Engine.connect() and Connection.begin(); the Engine.begin() method returns a special context manager that internally maintains both the context manager for the Connection as well as the context manager for the Transaction normally returned by the Connection.begin() method:
For convenience, here is also a database initialization script:
psql -U postgres -d postgres -c "CREATE TABLE IF NOT EXISTS cities (
id int PRIMARY KEY GENERATED ALWAYS AS IDENTITY,
name text,
country text,
geom geometry(Point,4326)
);
INSERT INTO public.cities (name, country, geom) VALUES
('Buenos Aires', 'Argentina', ST_SetSRID(ST_Point(-58.66, -34.58), 4326)),
('Brasilia', 'Brazil', ST_SetSRID(ST_Point(-47.91,-15.78), 4326)),
('Santiago', 'Chile', ST_SetSRID(ST_Point(-70.66, -33.45), 4326)),
('Bogota', 'Colombia', ST_SetSRID(ST_Point(-74.08, 4.60), 4326)),
('Caracas', 'Venezuela', ST_SetSRID(ST_Point(-66.86,10.48), 4326));"
I usually run this Python code in multiple parallel instances as a dockerized application.
When using the following SQL snippet while the app is running, e.g. from pgAdmin4, I can see plenty of idle
connections which are not properly closed. They stay in an idle
state as long as the app is running, showing simple COMMIT
queries. (In the whole application, there may also be some idle ROLLBACK
queries).
SELECT state, query, *
FROM pg_stat_activity
WHERE application_name NOT ILIKE '%pgAdmin 4%'
Here is the result of the previous query:
If by any chance there are too much of those pending idle connections, I can also face the following error, certainly because there are no more available "slots" in the database for extra transactions to take place:
Traceback (most recent call last):
File "/usr/local/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 145, in __init__
self._dbapi_connection = engine.raw_connection()
File "/usr/local/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 3269, in raw_connection
return self.pool.connect()
File "/usr/local/lib/python3.10/site-packages/sqlalchemy/pool/base.py", line 452, in connect
return _ConnectionFairy._checkout(self)
File "/usr/local/lib/python3.10/site-packages/sqlalchemy/pool/base.py", line 1255, in _checkout
fairy = _ConnectionRecord.checkout(pool)
File "/usr/local/lib/python3.10/site-packages/sqlalchemy/pool/base.py", line 716, in checkout
rec = pool._do_get()
File "/usr/local/lib/python3.10/site-packages/sqlalchemy/pool/impl.py", line 168, in _do_get
with util.safe_reraise():
File "/usr/local/lib/python3.10/site-packages/sqlalchemy/util/langhelpers.py", line 147, in __exit__
raise exc_value.with_traceback(exc_tb)
File "/usr/local/lib/python3.10/site-packages/sqlalchemy/pool/impl.py", line 166, in _do_get
return self._create_connection()
File "/usr/local/lib/python3.10/site-packages/sqlalchemy/pool/base.py", line 393, in _create_connection
return _ConnectionRecord(self)
File "/usr/local/lib/python3.10/site-packages/sqlalchemy/pool/base.py", line 678, in __init__
self.__connect()
File "/usr/local/lib/python3.10/site-packages/sqlalchemy/pool/base.py", line 902, in __connect
with util.safe_reraise():
File "/usr/local/lib/python3.10/site-packages/sqlalchemy/util/langhelpers.py", line 147, in __exit__
raise exc_value.with_traceback(exc_tb)
File "/usr/local/lib/python3.10/site-packages/sqlalchemy/pool/base.py", line 898, in __connect
self.dbapi_connection = connection = pool._invoke_creator(self)
File "/usr/local/lib/python3.10/site-packages/sqlalchemy/engine/create.py", line 640, in connect
return dialect.connect(*cargs, **cparams)
File "/usr/local/lib/python3.10/site-packages/sqlalchemy/engine/default.py", line 580, in connect
return self.loaded_dbapi.connect(*cargs, **cparams)
File "/usr/local/lib/python3.10/site-packages/psycopg2/__init__.py", line 122, in connect
conn = _connect(dsn, connection_factory=connection_factory, **kwasync)
psycopg2.OperationalError: FATAL: sorry, too many clients already
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/app/test.py", line 42, in <module>
foo()
File "/app/test.py", line 19, in foo
with engine.begin() as connection:
File "/usr/local/lib/python3.10/contextlib.py", line 135, in __enter__
return next(self.gen)
File "/usr/local/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 3209, in begin
with self.connect() as conn:
File "/usr/local/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 3245, in connect
return self._connection_cls(self)
File "/usr/local/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 147, in __init__
Connection._handle_dbapi_exception_noconnection(
File "/usr/local/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 2410, in _handle_dbapi_exception_noconnection
raise sqlalchemy_exception.with_traceback(exc_info[4]) from e
File "/usr/local/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 145, in __init__
self._dbapi_connection = engine.raw_connection()
File "/usr/local/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 3269, in raw_connection
return self.pool.connect()
File "/usr/local/lib/python3.10/site-packages/sqlalchemy/pool/base.py", line 452, in connect
return _ConnectionFairy._checkout(self)
File "/usr/local/lib/python3.10/site-packages/sqlalchemy/pool/base.py", line 1255, in _checkout
fairy = _ConnectionRecord.checkout(pool)
File "/usr/local/lib/python3.10/site-packages/sqlalchemy/pool/base.py", line 716, in checkout
rec = pool._do_get()
File "/usr/local/lib/python3.10/site-packages/sqlalchemy/pool/impl.py", line 168, in _do_get
with util.safe_reraise():
File "/usr/local/lib/python3.10/site-packages/sqlalchemy/util/langhelpers.py", line 147, in __exit__
raise exc_value.with_traceback(exc_tb)
File "/usr/local/lib/python3.10/site-packages/sqlalchemy/pool/impl.py", line 166, in _do_get
return self._create_connection()
File "/usr/local/lib/python3.10/site-packages/sqlalchemy/pool/base.py", line 393, in _create_connection
return _ConnectionRecord(self)
File "/usr/local/lib/python3.10/site-packages/sqlalchemy/pool/base.py", line 678, in __init__
self.__connect()
File "/usr/local/lib/python3.10/site-packages/sqlalchemy/pool/base.py", line 902, in __connect
with util.safe_reraise():
File "/usr/local/lib/python3.10/site-packages/sqlalchemy/util/langhelpers.py", line 147, in __exit__
raise exc_value.with_traceback(exc_tb)
File "/usr/local/lib/python3.10/site-packages/sqlalchemy/pool/base.py", line 898, in __connect
self.dbapi_connection = connection = pool._invoke_creator(self)
File "/usr/local/lib/python3.10/site-packages/sqlalchemy/engine/create.py", line 640, in connect
return dialect.connect(*cargs, **cparams)
File "/usr/local/lib/python3.10/site-packages/sqlalchemy/engine/default.py", line 580, in connect
return self.loaded_dbapi.connect(*cargs, **cparams)
File "/usr/local/lib/python3.10/site-packages/psycopg2/__init__.py", line 122, in connect
conn = _connect(dsn, connection_factory=connection_factory, **kwasync)
sqlalchemy.exc.OperationalError: (psycopg2.OperationalError) FATAL: sorry, too many clients already
(Background on this error at: https://sqlalche.me/e/20/e3q8)
This error regularly prevents my application from working correctly with many errors, e.g.: psycopg2.OperationalError: FATAL: sorry, too many clients already
.
I conclude that the Python code above is not perfectly written. Certainly because I didn't fully understand how the SQLAlchemy engine actually works.
Hence my question: could you explain what's wrong with this code and, most importantly, how could I properly close these queries?
Version info:
- Python 3.9.7
- geopandas==0.12.2
- psycopg2==2.9.5
- SQLAlchemy==2.0.1