Will psycopg2 cursor.fetchmany() see concurrent committed transactions?

Question

Consider the following code:

import psycopg2

conn = psycopg2.connect(**credentials)
cur = conn.cursor()
cur.execute('select * from some_table')  # Imagine some_table to be a very big table
while True:
    rows = cur.fetchmany(1000)
    if not rows:
        break
    do_some_processing(rows)

cur.close()
conn.commit()

Question 1: if a concurrent transaction inserts new rows into some_table while the loop is running, will the new rows be fetched if the transaction isolation level is set to "read committed"?

Question 2: if a concurrent transaction updates some rows in some_table while the loop is running, will the updated rows be fetched if the transaction isolation level is set to "read committed"?

According to Postgres documentation:

Read Committed is the default isolation level in PostgreSQL. When a transaction uses this isolation level, a SELECT query (without a FOR UPDATE/SHARE clause) sees only data committed before the query began; it never sees either uncommitted data or changes committed during query execution by concurrent transactions. In effect, a SELECT query sees a snapshot of the database as of the instant the query begins to run. However, SELECT does see the effects of previous updates executed within its own transaction, even though they are not yet committed. Also note that two successive SELECT commands can see different data, even though they are within a single transaction, if other transactions commit changes after the first SELECT starts and before the second SELECT starts.

In the code above there's only 1 SELECT query in the transaction, which means that there are no "successive SELECT commands", so my assumption is that the cursor will not see any new inserts/updates. Is it correct? If yes, then how does the cursor "remember" the old state of the database for the whole time? What if the loop runs for several hours/days? Will such situation cause some MVCC-related disk bloat or something like that?

Bloat happens if you do UPDATE(which is basically delete/insert) or DELETE in a transaction you hold open. Then the server has to leave the old rows untouched by VACUUM should they be needed if the transaction does a ROLLBACK. For a SELECT the server use the information here [System columns] to filter the rows it returns so new rows are just not seen as they exist in the future. See this post [MVCC](https://devcenter.heroku.com/articles/postgresql-concurrency) — Adrian Klaver, Dec 18 '21 at 00:59

score 3 · Accepted Answer · answered Dec 18 '21 at 02:45

Old versions of rows will be retained for as long as the cursor is open. Yes, this means holding a cursor open for several days will risk bloat, as no currently-obsolete tuples that were valid when the cursor was opened can be removed.

You could argue that this would only need to apply to the tables which the cursor was defined over as new tables can't be added to the query later on (in read-committed mode). But this reasoning doesn't work in the case of functions with dynamic SQL, which can bring in new tables at any time. In any case, it would make the accounting much more complicated, so it isn't done. So no tuples from the era of the cursor can be removed, database wide, until the cursor is closed.

score 1 · Answer 2 · answered Dec 18 '21 at 00:44

Your cursor will see whatever records were present when the SELECT statement began, no matter how long the cursor lasts. Database servers are very good at keeping multiple "generations" of tables separate. If you use BEGIN TRANSACTION, for example, then no one else but you will see the changes that you make until there is a COMMIT TRANSACTION.

Will psycopg2 cursor.fetchmany() see concurrent committed transactions?

2 Answers2