I have a Python program collecting monitoring data every second. Data will be buffered in a SQLite database for later transmission. Database scheme: unix_timestamp integer, device_id text, s0 integer, s1 integer, ..., s99 integer. And an index on the 'unix_timestamp' column was created. Integer values for s0-s99 will be in the range 0-65535. The max. buffer size is 4 weeks data corresponding to some 2.5million rows in the database. Ultimately, there will be three processes accessing the database: data collector, deletion to keep max. buffer size to latest 4 weeks, data transmission.
Concerned about locks and timeouts of a database interaction I tried to do some kind of profiling (select, delete, insert) on a device with 512MB RAM, 1GHZ RISC armv7 architecture.
Working with a max. filled database (see above) timings were recorded using pysqlite and the following code block:
start = time.time()
c = conn.cursor()
c.execute(...)/executemany(...)
c.commit()
end=time.time()
These are the results:
- deletion of a 60s range (DELETE FROM measurements WHERE timestamp >= ? AND timestamp < ?): max. 0.872s, min. 0.094s
- insert of 60s range (INSERT INTO measurements (...) VALUES (...)"; 'executemany' function): 0.079 - 0.111s
- select a 60s range (SELECT timestamp, device_id, s0, ...,s99 FROM measurements WHERE timestamp >= ? and timestamp < ? ORDER BY timestamp): 0.031-0.034s
So I have several questions here:
- From your experience/what you guess: are these timings in the expected range for the scenario detailed above or can this be speeded up using other database settings/another Python SQLite lib?
- Is it possible to guess the bottleneck so that it might be an idea to look for SD cards with higher read/write rates?
- Recorded times vary a lot (e.g. insert as detailed above might take up to ~1s from time to time): is this due to loading DB pages into RAM vs working with cached pages or how can this be explained?
- Presumably, the lock of a DB writing process will be shorter than my timings (cf. SQLite's parsing and code generation): is there sth. like a rule of thumb for the ratio of the time the lock persists compared to the overall time spent in the SQLite lib?
- I tried using 'PRAGMA journal_mode=WAL;' as well. This speeds up write processes (insert, delete). But I am not sure whether this will work if my processes interacting with the DB run in different docker containers. The docs on WAL state: '... the use of shared memory means that all readers must exist on the same machine. This is why the write-ahead log implementation will not work on a network filesystem'. Does this imply running processes in different docker containers will be a problem as well?