1

I have two tables as follows (no indices):

CREATE TABLE dag(
id integer primary key,
v integer,
e integer,
cc integer,
h integer,
w integer,
ls integer,
le blob, 
am blob);

CREATE TABLE additional_comp(
id integer primary key,
did integer,
t integer,
ubd integer,
lbd integer,
tlbd integer);

My program (in C) generates a few million entries (stored in two balanced binary search trees in RAM) and tries to insert them into these two tables. I have 50000 insertions per transaction and output time used in a log, here it is:

dag has 1573505 records
comp has 2375074 records
db_dag has 5 records
db_comp has 11 records
insert dags, Mon Nov 6 15:42:11 2017
0 rows done, Mon Nov 6 15:42:11 2017
50000 rows done, Mon Nov 6 16:45:57 2017
100000 rows done, Mon Nov 6 18:25:22 2017
150000 rows done, Mon Nov 6 20:29:01 2017
200000 rows done, Mon Nov 6 22:36:13 2017
250000 rows done, Tue Nov 7 03:52:03 2017
300000 rows done, Tue Nov 7 10:01:38 2017
350000 rows done, Tue Nov 7 17:33:52 2017
400000 rows done, Tue Nov 7 22:49:14 2017
450000 rows done, Wed Nov 8 06:06:28 2017
500000 rows done, Wed Nov 8 11:33:49 2017
550000 rows done, Wed Nov 8 16:53:04 2017
600000 rows done, Wed Nov 8 17:08:46 2017

650000 rows done, Wed Nov 8 17:10:16 2017
700000 rows done, Wed Nov 8 17:11:19 2017
750000 rows done, Wed Nov 8 17:12:17 2017
800000 rows done, Wed Nov 8 17:13:03 2017
850000 rows done, Wed Nov 8 17:13:49 2017
900000 rows done, Wed Nov 8 17:14:48 2017
950000 rows done, Wed Nov 8 17:15:30 2017
1000000 rows done, Wed Nov 8 17:16:11 2017
1050000 rows done, Wed Nov 8 17:16:51 2017
1100000 rows done, Wed Nov 8 17:17:43 2017
1150000 rows done, Wed Nov 8 17:18:26 2017
1200000 rows done, Wed Nov 8 17:19:13 2017
1250000 rows done, Wed Nov 8 17:20:14 2017
1300000 rows done, Wed Nov 8 17:20:59 2017
1350000 rows done, Wed Nov 8 17:21:43 2017
1400000 rows done, Wed Nov 8 17:22:21 2017
1450000 rows done, Wed Nov 8 17:22:57 2017
1500000 rows done, Wed Nov 8 17:23:33 2017
1550000 rows done, Wed Nov 8 17:24:03 2017
.....done
insert comps, Wed Nov 8 17:24:13 2017
1600000 rows done, Wed Nov 8 17:24:21 2017
1650000 rows done, Wed Nov 8 17:24:22 2017
1700000 rows done, Wed Nov 8 17:24:23 2017
1750000 rows done, Wed Nov 8 17:24:23 2017
1800000 rows done, Wed Nov 8 17:24:23 2017
1850000 rows done, Wed Nov 8 17:24:24 2017
1900000 rows done, Wed Nov 8 17:24:24 2017
1950000 rows done, Wed Nov 8 17:24:24 2017
2000000 rows done, Wed Nov 8 17:24:25 2017

Note that it was extremely slow (several hours per 50000 insertions) at the beginning (bold part), but suddenly sped up (a couple minutes per 50000 insertions). Then for the comp table, the speed was quite stable.

Another instance of the same program was running on another core (inserting into a different copy of the database), no other "big" programs running on the machine. The same slow-->fast pattern happened with it as well around (but not exactly at) the same time. This makes me feel like that something happened before Nov 8 17:08 that slows down the whole machine.

But the machine has 16 cores (as mentioned, only 2 was in use) and 300GB RAM (about 2% was in use). The question is, what kind of things may have happened that make sqlite slow down this much? Or if it was not the machine, was I doing something wrong with sqlite? Any suggestion/idea is appreciated.

ddbrake
  • 11
  • 1
  • Are you inserting the `id` values, or using autoincrement? In the first case, in which order? – CL. Nov 09 '17 at 08:25
  • Yes, I'm inserting the `id` values. Each entry is assigned a unique (auto-incremented) `id` when it is generated then it is inserted into the tree based on another key (combination of `v`, `e`, etc., so it can be searched from the tree efficiently later). So eventually the in-order traversal will prepare insertions in some indefinite order according to `id`. Is sqlite checking for criteria of primary keys making insertions slow? But why did it speed up later? – ddbrake Nov 09 '17 at 22:30
  • Insertion works fastest if the row is appended at the end of the table, i.e., if the `rowid` is larger than all previous ones. I don't know how your ID values are generated. – CL. Nov 10 '17 at 07:40

0 Answers0