3

I am developing an application in which I am iterating over many (1,000,000+) rows in a table while inserting new rows and updating existing rows along the way. It is a requirement that the select statement yields every row in the table (those that exist when the select is initially executed) exactly once, and never yield rows that are inserted after the select is executed. I would prefer not to load all rows into memory (that takes a long time and a lot of RAM—I tried it).

I have developed a small Python example that demonstrates that SQLite apparently does not isolate inserts (and presumably updates and deletes) from a long-running select. I was not able to find any place in the SQLite documentation that specifically mentions this behavior, but I found several links that alluded to the fact that the insert would fail (possibly in earlier versions of SQLite?), which it does not in my example.

import sqlite3

def select_affected_by_insert():
    # select from and simultaneously modify same table
    cn = sqlite3.connect(':memory:')
    cn.execute("CREATE TABLE demo (v INTEGER PRIMARY KEY)")

    n = 5
    values = [[v] for v in range(n)]
    cn.executemany('INSERT INTO demo VALUES (?)', values)

    for (v,) in cn.execute('SELECT v FROM demo'):

        with cn:
            # insert in transaction
            cn.execute('INSERT INTO demo VALUES (?)', [n + v])

        print v, n + v
        assert v < n, 'got more rows than expected!'

if __name__ == '__main__':
    select_affected_by_insert()

SQLite 3.6.12
Python 2.6.4

Is there a better way to work around this than copying the data to a separate (temporary) table and select from there?

Clarification: I neglected to say that I need to do commits inside the loop. The process may be interrupted, and partially done work must be committed so it does not need to be redone on the next run-through.

millerdev
  • 10,011
  • 2
  • 31
  • 27

2 Answers2

5
  1. Use WAL mode (so the writer and reader do not interfere)
  2. Use separate connections for the reader and writer
Doug Currie
  • 40,708
  • 1
  • 95
  • 119
2

if you add open your database in a deferred transaction mode and COMMIT at the end of your SELECT-INSERT logic, like so:

cn = sqlite3.connect(':memory:', isolation_level='DEFERRED')
...
for (v,) in cn.execute('SELECT v FROM demo'):
    cn.execute('INSERT INTO demo VALUES (?)', [n + v])
cn.commit()

Your insert statements should be deferred until the end of the block. From the SQLite Docs for Transaction Control:

If multiple commands are being executed against the same SQLite database connection at the same time, the autocommit is deferred until the very last command completes. For example, if a SELECT statement is being executed, the execution of the command will pause as each row of the result is returned. During this pause other INSERT, UPDATE, or DELETE commands can be executed against other tables in the database. But none of these changes will commit until the original SELECT statement finishes.

John Lyon
  • 11,180
  • 4
  • 36
  • 44
  • Thank you for your response. While this would be a good approach if the select were not so huge, I want to do commits along the way so I can resume the process without having to redo all the work later in the event of a program crash. – millerdev Dec 09 '11 at 18:17
  • The default `isolation_level` is `DEFERRED` anyway, so that change alone will make no difference. There's quite a few things muddled up in this answer: statements in a transaction are always immediately visible to other statements in the same transaction (in particular, all statements in the same connection object); DEFERRED won't affect that. Conversely statements in a transaction are NEVER visible to statements on another connection until it's committed; again DEFERRED is irrelevant. – Arthur Tacca Jan 02 '21 at 15:34
  • The isolation mode (except for None) just refers to how aggressively a lock is taken when the transaction is *started*, so might affect efficiency but not which changes are visible from which other statements. The quote at the end is about autocommit mode, which only applies if there's no explicit transaction at all, whereas DEFERRED ultimately gets passed as part of beginning and explicit transaction, so can never apply to autocommit mode. – Arthur Tacca Jan 02 '21 at 15:36