3

I have a simple service application: python, tornado web server, sqlite database. The database is encrypted.

The problem is that processing even very simple http request takes about 300msec.

From logs I can see that most of that time takes processing of the very first sql request, no matter how simple this first request is. Subsequent sql requests are processed much faster. But then server starts processing next http request, and again the first sql request is very slow.

If I turn off the database encryption the problem is gone: processing time of sql requests does not depend on if the request is first or not and my server response time decreases by factor 10 to 15.

I do not quite understand what's going on. Looks like sqlalchemy reads and decrypts the database file each time it starts new session. Is there any way to workaround this problem?

lesnik
  • 2,507
  • 2
  • 25
  • 24
  • Please read [ask]. Can you share the relevant code with us in the form of a [mcve]? – ChrisGPT was on strike Oct 24 '18 at 18:56
  • @Chris Creating a very small test example is a good idea, I'll do it. But I doubt it will be small enough to provide the whole code here. I sincerely hoped that my description would be enough for someone to identify the problem. – lesnik Oct 24 '18 at 19:07
  • Encryption carries overhead. "The problem" may be that simple, but I don't think that's the answer you're looking for. – ChrisGPT was on strike Oct 24 '18 at 19:07
  • Related: https://stackoverflow.com/questions/22739590/how-to-share-single-sqlite-connection-in-multi-threaded-python-application – Ilja Everilä Oct 24 '18 at 21:12

2 Answers2

6

Due to how pysqlite, or the sqlite3 module, works SQLAlchemy defaults to using a NullPool with file-based databases. This explains why your database is decrypted per each request: a NullPool discards connections as they are closed. The reason why this is done is that pysqlite's default behaviour is to disallow using a connection in more than one thread, and without encryption creating new connections is very fast.

Pysqlite does have an undocumented flag check_same_thread that can be used to disable the check, but sharing connections between threads should be handled with care and the SQLAlchemy documentation makes a passing mention that the NullPool works well with SQLite's file locking.

Depending on your web server you could use a SingletonThreadPool, which means that all connections in a thread are the same connection:

engine = create_engine('sqlite:///my.db',
                       poolclass=SingletonThreadPool)

If you feel adventurous and your web server does not share connections / sessions between threads while in use (for example using a scoped session), then you could try using a different pooling strategy paired with check_same_thread=False:

engine = create_engine('sqlite:///my.db',
                       poolclass=QueuePool,
                       connect_args={'check_same_thread':False})
Ilja Everilä
  • 50,538
  • 7
  • 126
  • 127
  • Thank you. Using SingletonThreadPool did speed up some requests. My big concern now is how this would affect concurrent requests modifying database. – lesnik Oct 25 '18 at 09:55
  • If you're using the thread per request model and either manually handle creating local-to-thread or use a scoped session, you should be fine. But I've some doubts about concurrent access in an async setting: it could be that a thread is running multiple async request handlers concurrently, which will lead to trouble. In that situation the singleton thread pool is not suitable, but a queue pool should be fine (I think), since a handler checks out a connection and only at the end of a request hands it back. Locking may become an issue as well, but that's a territory I'm not that familiar with. – Ilja Everilä Oct 25 '18 at 10:01
0

To encrypt database sqlcipher creates a key from the passphrase I provided. This operation is resource consuming by design.

But it is possible to use not a passphrase, but 256-bit raw key. In this case sqlcipher would not have to generate the encryption key.

Originally my code was:

session.execute('PRAGMA KEY = "MY_PASSPHRASE";')

To use raw key I changed this line to:

session.execute('''PRAGMA KEY = "x'<the key>'";''')

where <the key> is 64 characters long string of hexadecimals.

Result is 20+ times speed up on small requests.

Just for reference: to convert database to use new encryption key the following commands should be executed:

PRAGMA KEY = ""MY_PASSPHRASE";
PRAGMA REKEY = "x'<the key>'";

Related question: python, sqlite, sqlcipher: very poor performance processing first request

Some info about sqlcipher commands and difference between keys and raw keys: https://www.zetetic.net/sqlcipher/sqlcipher-api/

lesnik
  • 2,507
  • 2
  • 25
  • 24