1

I am working on a task where I need to loop through a bunch of IDs and fetch relevant data from the PgSQL server 12.7. I have added a for loop in python and all the relevant code to connect to the database using psycopg2 version 2.9.1

connectStr = "dbname='dbName' user='postgres' password='test123' host='localhost'"
conn = psycopg2.connect(connectStr)
dataList = []
with conn.cursor() as newCursor:
  query = "select * from myTable where ID = <ID in the loop>"
  newCursor.execute(query)
  for row in newCursor:
     dataList.append(row[0])

This code runs well for some IDs but then goes into the long-running state(more than 2 hours) for one random ID. This stuck ID when taken out and run separately, executes in 30 seconds. (Avg time for each ID execution is ~30 seconds)

I checked the database level locks and pg_stat_activity table as well using the query -

SELECT
activity.pid,
activity.usename,
activity.query,
blocking.pid AS blocking_id,
blocking.query AS blocking_query
FROM pg_stat_activity AS activity
JOIN pg_stat_activity AS blocking ON blocking.pid = ANY(pg_blocking_pids(activity.pid));

but as it stands, it is just the "long-running" query and not the "blocking" query.

using the query select * from pg_stat_activity where state = 'active' I am able to see "BufFileRead" in the wait_event column for a few milliseconds, it goes back to Null.

I am explicitly closing all the cursors and connections after each loop still it gets stuck on a "random" ID.

Can someone please guide me here on how to resolve this? I can share more information as needed.

Thank you,

Sanket

Sanket Kelkar
  • 129
  • 2
  • 9
  • Have you added an index for the column ID? A B-Tree Index or Hash index might speed up the SELECT statement. – shreeraviadhikari Dec 30 '21 at 07:45
  • @raviadhikari - yes, the B-Tree index is added already for that column. The main problem here is the query getting randomly stuck after running fine for some IDs. – Sanket Kelkar Dec 30 '21 at 08:03
  • using this - "select * from pg_stat_activity where state = 'active'" I am able to see "BuffFileRead" in the wait_event for a few milliseconds, it goes back to Null. – Sanket Kelkar Dec 30 '21 at 08:06
  • As you're using a loop, can you pinpoint which query is get stuck – shreeraviadhikari Dec 30 '21 at 08:33
  • Your python code doesn't show any loops over the execute, only under it. – jjanes Dec 30 '21 at 15:38
  • Since the problem seems to be reproducible, I would reproduce it on a test server, then try to use https://github.com/StarfishStorage/explain-running-query to try to extract the plan it is running. – jjanes Dec 30 '21 at 15:53

1 Answers1

0

Using a for loop to query database is almost always a bad idea. It will put high load on the database and the response will be much slower. In most cases the time will be unpredictable, just what happened in your case.

One way of optimizing your query is this:

SELECT * FROM TABLE WHERE ID IN(1, 2, 3, 4)

You can create the ID array in prior before executing the query. You're running a single query here instead of N number of queries. It will perform much faster. So the basic rule of thumb for DB performance is, fetch your data with the least amount of queries possible. For your case, you can fetch the data with a single query. If you run N number of queries in N iterations, your DB performance will suffer drastically. Also, IDs are normally treated as primary key so I'm assuming just that, in case ID is not a primary key, make it a primary key OR add indexing.

Sadman Muhib Samyo
  • 2,438
  • 2
  • 10
  • 16