Consider the following code:
import psycopg2
conn = psycopg2.connect(**credentials)
cur = conn.cursor()
cur.execute('select * from some_table') # Imagine some_table to be a very big table
while True:
rows = cur.fetchmany(1000)
if not rows:
break
do_some_processing(rows)
cur.close()
conn.commit()
Question 1: if a concurrent transaction inserts new rows into some_table
while the loop is running, will the new rows be fetched if the transaction isolation level is set to "read committed"?
Question 2: if a concurrent transaction updates some rows in some_table
while the loop is running, will the updated rows be fetched if the transaction isolation level is set to "read committed"?
According to Postgres documentation:
Read Committed is the default isolation level in PostgreSQL. When a transaction uses this isolation level, a SELECT query (without a FOR UPDATE/SHARE clause) sees only data committed before the query began; it never sees either uncommitted data or changes committed during query execution by concurrent transactions. In effect, a SELECT query sees a snapshot of the database as of the instant the query begins to run. However, SELECT does see the effects of previous updates executed within its own transaction, even though they are not yet committed. Also note that two successive SELECT commands can see different data, even though they are within a single transaction, if other transactions commit changes after the first SELECT starts and before the second SELECT starts.
In the code above there's only 1 SELECT
query in the transaction, which means that there are no "successive SELECT commands", so my assumption is that the cursor will not see any new inserts/updates. Is it correct? If yes, then how does the cursor "remember" the old state of the database for the whole time? What if the loop runs for several hours/days? Will such situation cause some MVCC-related disk bloat or something like that?