16

On one of my servers (i7 Ivy Core, 32 GB RAM, Debian 6 @ 64bit, PHP 5.4.10) I experience extremely slow inserts with SQLite. The following test program reports just 2.2 inserts/second (14 seconds for inserting 30 rows).

unlink("test.db");

$db = new PDO('sqlite:test.db');

$db->exec("CREATE TABLE test (dummy INT)");

$count = 30;

$t = microtime(true);
for ($i=0; $i<$count; $i++) {
  $db->exec("INSERT INTO test VALUES ($i)")
   or die("SQLite error: ".$db->errorInfo()[2]."\n");
}
$elapsed = microtime(true)-$t;
echo sprintf("%d inserts in %.3f secs (%.1f q/s)\n", 
  $count, $elapsed, $count/$elapsed);

Output:

$ time php test.php
30 inserts in 13.911 secs (2.2 q/s)

real    0m14.634s
user    0m0.004s
sys     0m0.016s

I know that I can accelerate this by wrapping BEGIN/END around the insert statements (which gives me 200000 q/s) but even without a transaction this should be much faster. On other (older) machines (same PHP version) I reach 300+ queries/sec without explicit transactions.

What could be the cause for this? Do I have to tune Sqlite or the O/S?

LSerni
  • 55,617
  • 10
  • 65
  • 107
Udo G
  • 12,572
  • 13
  • 56
  • 89
  • Is there anything else taking up HDD-s I/Os? – N.B. Mar 14 '13 at 15:37
  • CPU is 85% idle, HDD writes are constantly about 500 K/s (mostly caused by `[jbd2/md*-8]`) – Udo G Mar 14 '13 at 15:45
  • It's not the amount of data being written each second, it's the amount of input-output operations caused by mentioned program. Most db transactions aren't larger than a few kilobytes, but if they take up all 300 IOPS that mechanical drives (usually) have, then you'll get low troughput. – N.B. Mar 14 '13 at 15:49
  • When you do not use explicit transactions, SQLite will automatically wrap a transaction around *each* `INSERT` command. – CL. Mar 14 '13 at 15:58

1 Answers1

41

I have done a similar test on a Linux 64bit machine using strace -C -tt to have an idea of where SQLite3 is taking time.

% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 99.03    0.004000          32       124           fsync
  0.64    0.000026           0       222           mprotect
  0.32    0.000013           0       216           munmap

The obvious delay is in the fsync function, which is:

  • configurable
  • depends on general disk I/O (check out iotop, iostat)
  • heavily depends on IOSS (therefore, the file system and disk allocation - you might get one value on ext3, a different one on xfs, and a third one on btrfs)
  • depends of course, indirectly, on underlying hardware and its quirks or tunings.

By turning syncing off, my SQLite3 performance increases by a factor of around three thousand:

$db = new PDO('sqlite:test.db');

$db->exec('pragma synchronous = off;');

I too have two different values on two very similar machines (one has ext4, the other XFS, but I'm not positive this is the main reason - their load profiles are also different).

By the way, using prepared statements just about doubles the execution speed at the fastest level (from 45k to 110k INSERTs, in batches of 3000 since at that speed 30 INSERTs are bound to give spurious timings), and raises the lowest speed from about 6 to about 150.

So this (using prepared statements) might be a good solution to improve repeated operations without touching file synchronization, i.e., while still being demonstrably sure that data risk level remains the same. After that I'd try transactions or fsync (maybe even memory journaling) depending on the risk and worth of a data outage.

When designing a system from the ground up, some tests on different FS's are surely advisable.

Tests on different file systems (same machine)

ext4 (acl,user_xattr,data=order)         5.5 queries/s
using transactions                       170 queries/s
disabling fsync                        16000 queries/s
using transactions and disabling fsync 47200 queries/s

On a temporary file system, fsync is cheap, so turning it off yields little benefit. Most of the time is spent guarding, so transactions are key.

tmpfs                                  13700 queries/s
disabling fsync                        15350 queries/s
enabling transactions                  47900 queries/s
using transactions and disabling fsync 48200 queries/s

Of course, proper data organization and indexing has to be taken into account and, for large data sets, might well turn out to be more important.


UPDATE: to squeeze some more performance, one can also put the SQLite journal into memory with pragma journal_mode=MEMORY;

Also, you can tell ext3/4 not to bother updating atime on SQLite databases (this hugely depends on the implementation, though). You can try adding noatime to the file system where the database resides, and if it works, you can put it into /etc/fstab (you can also use relatime instead of the more extreme noatime:

sudo mount /var -o remount,noatime
LSerni
  • 55,617
  • 10
  • 65
  • 107
  • Correct! This gives me up to 10 kq/sec :-) Do I risk data loss using this solution (fs is `ext4` with journaling)? – Udo G Mar 14 '13 at 15:49
  • 2
    Difficult to say. `fsync` only guarantees IOSS integrity, **not** data integrity. For that, you need transactions and full commit (i.e., the disk physical info must *always* be in a consistent, resumable state). And it probably is not: if you have disk caching and a FS with no barrier, *then fsync can not save you in case of power loss*. So removing `fsync` altogether does not increase significantly the existing risk, which is very near that of a power loss. What *is* that risk? I can't say. I'd think *UPS and good quality hardware* before thinking FS or fsync, though :-) – LSerni Mar 14 '13 at 15:55
  • Thanks for the additional hints. In my case I have very different queries that are not related to each other (I even manage several hundreds of databases), so transactions are not always possible. Anyway that `pragma` solution works like a charm and in my case data loss is not such a big problem as long as the .db file remains still usable. – Udo G Mar 15 '13 at 08:57
  • 1
    I was having a hell of a time with some robolectric database tests when running on linux. Changing this pragma setting made them about 3 thousand times faster. – kingargyle Jul 16 '15 at 19:48
  • 1
    FWIW, I was running on an HFS+ fs and this was very helpful. – Garr Godfrey Mar 31 '18 at 00:47