I have an in-memory SQLite database that is being created dynamically based on user input using sqlite3 in Python3.9. I have one update statement that is an issue in terms of speed. I have simplified the case to only include the SQLite and not Python.
TABLES
The table to be updated is called West100_1
which contains two id columns: id
and pseudo_id
. pseudo_id
is NULL in most places, and there are a maximum of 4 values for pseudo_id
where the values for id
are all the same. All other columns are in the configureation of {taxon]_PL, which in this case are 21 columns. This table can get quite big, upwards of a million rows.
the other tables are:
100_1
a map of points with the sameid
s as inWest100_1
, so can also get quite big. Contains columns of the size (resolution
) of each point which has two possible values, and the associated communityveg_com
(and a number of other columns such as geometry that aren't relevant here).sampling_sites
which contains the name and point geometry (x and y) of sampling sites. Small table, in this case 2 rowsvegcom
which contains the names of the vegetation communities and the percentage cover of a taxon in that community. Small table, usually no more than 1000 rows.dist_dir
contains a lookup table of the distance and direction from sampling sites for eachid
. Same number of rows as {map_name}.pseudo_points
contains the same lookup table but for the pseudo points. Very small, usually no more than 40 rows.DistanceLookup
a lookup table containing per plant taxon a weighting value for each possible distance. Fairly big, usually about a quarter of the number of rows of the main table.windrose
a small lookup table with two colums: cardinal direction and weighting. Small, always 8 rows.
QUERY
THe query is essentially a mathmatical formula for which the inputs around taken from the various tables. The query needs to be run multiple times, once for each entry in sampling_sites
(example here being "West") * each entry in taxa
(example here being Cyperaceae). In this case that means 42 times. The query below is ONE of these queries, but they are otherwise all the same in construction.
UPDATE West100_1 SET "Cyperaceae_PL" =
(0.87 *
(SELECT "vegcom_percent" FROM vegcom WHERE "taxon_code" = "Cyperaceae" AND "veg_com" =
(SELECT "veg_com" FROM "100_1" WHERE (id = West100_1.id)
)
)
*
(CASE WHEN West100_1.pseudo_id NOT NULL THEN
(SELECT "Cyperaceae_DW" FROM DistanceLookup WHERE DistanceLookup.distance =
(SELECT "distance" FROM pseudo_points WHERE (pseudo_id = West100_1.pseudo_id)
)
)
ELSE
(SELECT "Cyperaceae_DW" FROM DistanceLookup WHERE DistanceLookup.distance =
(SELECT "distance" FROM dist_dir WHERE (id = West100_1.id) AND (site_name = "West")
)
)
END)
*
(SELECT "windrose_weight" FROM windrose WHERE
(CASE WHEN West100_1.pseudo_id is NULL THEN
("direction" =
(SELECT "direction" FROM dist_dir WHERE ((id = West100_1.id) AND (site_name = "West"))
)
)
ELSE
("direction" =
(SELECT "direction" FROM pseudo_points WHERE (pseudo_id = West100_1.pseudo_id)
)
)
END)
)
)
I have applied indices wherever I could think to add them, and that results in this query plan. Unfortunately to avoid an error in the overarching programme earlier on I have to copy all the tables which makes using keys impossible as (I think) there exists no way to copy tables with keys intact.
QUERY PLAN
[(4, 0, 0, 'SCAN West100_1'),
(21, 0, 0, 'CORRELATED SCALAR SUBQUERY 2'),
(26, 21, 0, 'SEARCH vegcom USING INDEX sqlite_autoindex_vegcom_1 (veg_com=? AND taxon_code=?)'),
(28, 21, 0, 'CORRELATED SCALAR SUBQUERY 1'),
(33, 28, 0, 'SEARCH 100_1 USING INDEX 100_1_idx (id=?)'),
(57, 0, 0, 'CORRELATED SCALAR SUBQUERY 4'),
(62, 57, 0, 'SEARCH DistanceLookup USING INDEX sqlite_autoindex_DistanceLookup_1 (distance=?)'),
(64, 57, 0, 'CORRELATED SCALAR SUBQUERY 3'),
(69, 64, 0, 'SEARCH pseudo_points USING INDEX sqlite_autoindex_pseudo_points_1 (pseudo_id=?)'),
(92, 0, 0, 'CORRELATED SCALAR SUBQUERY 6'),
(97, 92, 0, 'SEARCH DistanceLookup USING INDEX sqlite_autoindex_DistanceLookup_1 (distance=?)'),
(99, 92, 0, 'CORRELATED SCALAR SUBQUERY 5'),
(104, 99, 0, 'SEARCH dist_dir USING INDEX dist_dir_id_name_idx (id=? AND site_name=?)'),
(128, 0, 0, 'CORRELATED SCALAR SUBQUERY 9'),
(132, 128, 0, 'SCAN windrose'),
(138, 128, 0, 'CORRELATED SCALAR SUBQUERY 7'),
(143, 138, 0, 'SEARCH dist_dir USING INDEX dist_dir_id_name_idx (id=? AND site_name=?)'),
(160, 128, 0, 'CORRELATED SCALAR SUBQUERY 8'),
(165, 160, 0, 'SEARCH pseudo_points USING INDEX sqlite_autoindex_pseudo_points_1 (pseudo_id=?)')]
I am not very proficient at reading the query plans, is there any index that I am missing?
In an example where West100_1
has 284903 points, the combined 41 queries take ~60 seconds to run, so about 1.5 seconds per query. In the further code this needs to be done many times (more than 1000 times), causing the code to run for about a day so I would really like to shave off time any way I can.
table examples (top 10 rows):
West100_1 (before filling)
msa_id | pseudo_id | Alnus_PL | Apiaceae_PL | Ast_lactuceae_PL | Betula_PL | Cerealia_PL | Chenopodiaceae_PL | Corylus_PL | Cyperaceae_PL | Filipendula_PL | Fraxinus_PL | Plantago_PL | Quercus_PL | Ranunculaceae_PL | Rubiaceae_PL | Rumex_PL | Salix_PL | Sambucus_PL | Solidago_PL | Tilia_PL | Urtica_PL | Poaceae_PL | Ulmus_PL |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | [NULL] | ||||||||||||||||||||||
2 | [NULL] | ||||||||||||||||||||||
3 | [NULL] | ||||||||||||||||||||||
4 | [NULL] | ||||||||||||||||||||||
5 | [NULL] | ||||||||||||||||||||||
6 | [NULL] | ||||||||||||||||||||||
7 | [NULL] | ||||||||||||||||||||||
8 | [NULL] | ||||||||||||||||||||||
9 | [NULL] | ||||||||||||||||||||||
10 | [NULL] |
100_1
msa_id | geom_X | geom_Y | veg_com | chance_to_happen | resolution | lidar_su1 | 50m_basin | 2m_basin | 5m_basin | 10m_basin | ESB_DESC |
---|---|---|---|---|---|---|---|---|---|---|---|
53607 | 456224.3438 | 412147.5 | Alnus Carr | 0 | 1 | 4.178999901 | PEAT | ||||
53606 | 456223.3438 | 412147.5 | Alnus Carr | 0 | 1 | 4.179999828 | |||||
53609 | 456226.3438 | 412147.5 | Alnus Carr | 0 | 1 | 4.209000111 | PEAT | ||||
53608 | 456225.3438 | 412147.5 | Alnus Carr | 0 | 1 | 4.196000099 | PEAT | ||||
53603 | 456220.3438 | 412147.5 | Alnus Carr | 0 | 1 | 4.175000191 | PEAT | ||||
53602 | 456219.3438 | 412147.5 | Alnus Carr | 0 | 1 | 4.169000149 | PEAT | ||||
53605 | 456222.3438 | 412147.5 | Alnus Carr | 0 | 1 | 4.188000202 | PEAT | ||||
53604 | 456221.3438 | 412147.5 | Phramitetalia_Fen | 0 | 1 | 4.191999912 | PEAT | ||||
53599 | 456216.3438 | 412147.5 | Alnus Carr | 0 | 1 | 4.164000034 | PEAT | ||||
53598 | 456215.3438 | 412147.5 | Alnus Carr | 0 | 1 | 4.168000221 | PEAT |
sampling sites
site_name | sample_x | sample_y | sample_is_lake | snapped_x | snapped_y | msa_id |
---|---|---|---|---|---|---|
West | 456328.7 | 412024.2 | no | 456328.3438 | 412048.5 | 88361 |
East | 456811.0 | 412150.0 | no | 456811.3438 | 412174.5 | 219294 |
vegcom
veg_com | taxon_code | vegcom_percent |
---|---|---|
Quercus_woodland | Alnus | 0.0 |
Quercus_woodland | Betula | 1.0 |
Quercus_woodland | Cyperaceae | 0.0 |
Quercus_woodland | Filipendula | 0.0 |
Quercus_woodland | Quercus | 68.0 |
Quercus_woodland | Poaceae | 5.0 |
Quercus_woodland | Rubiaceae | 0.0 |
Quercus_woodland | Salix | 0.0 |
Quercus_woodland | Sambucus | 0.0 |
Quercus_woodland | Urtica | 0.0 |
dist_dir
msa_id | site_name | geom_X | geom_Y | distance | direction |
---|---|---|---|---|---|
1 | West | 451594.8438 | 417062.0 | 6895.013016666466 | SE |
2 | West | 451644.8438 | 417062.0 | 6860.783810906739 | SE |
3 | West | 451694.8438 | 417062.0 | 6826.749189768143 | SE |
4 | West | 451744.8438 | 417062.0 | 6792.912078041346 | SE |
5 | West | 451794.8438 | 417062.0 | 6759.275441939025 | SE |
6 | West | 451844.8438 | 417062.0 | 6725.842289260134 | SE |
7 | West | 451894.8438 | 417062.0 | 6692.615669527125 | SE |
8 | West | 451944.8438 | 417062.0 | 6659.598674094408 | SE |
9 | West | 451994.8438 | 417062.0 | 6626.7944362263115 | SE |
10 | West | 452044.8438 | 417062.0 | 6594.206131142701 | SE |
pseudo_points
pseudo_id | site_name | msa_id | direction | distance | geom_X | geom_Y |
---|---|---|---|---|---|---|
0 | West | 88361 | NE | 17.67766952966369 | 456353.3438 | 412073.5 |
1 | West | 88361 | SE | 17.67766952966369 | 456353.3438 | 412023.5 |
2 | West | 88361 | SW | 17.67766952966369 | 456303.3438 | 412023.5 |
3 | West | 88361 | NW | 17.67766952966369 | 456303.3438 | 412073.5 |
4 | East | 219294 | NE | 17.67766952966369 | 456836.3438 | 412199.5 |
5 | East | 219294 | SE | 17.67766952966369 | 456836.3438 | 412149.5 |
6 | East | 219294 | SW | 17.67766952966369 | 456786.3438 | 412149.5 |
7 | East | 219294 | NW | 17.67766952966369 | 456786.3438 | 412199.5 |
DistanceLookup
distance | Alnus_DW | Apiaceae_DW | Ast_lactuceae_DW | Betula_DW | Cerealia_DW | Chenopodiaceae_DW | Corylus_DW | Cyperaceae_DW | Filipendula_DW | Fraxinus_DW | Plantago_DW | Quercus_DW | Ranunculaceae_DW | Rubiaceae_DW | Rumex_DW | Salix_DW | Sambucus_DW | Solidago_DW | Tilia_DW | Urtica_DW | Poaceae_DW | Ulmus_DW |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
6895.013016666466 | 0.001357056 | 0.0005537155 | 0.0003402101 | 0.001235861 | 0.0001178434 | 0.001428499 | 0.001193509 | 0.0007838263 | 0.001206789 | 0.001318041 | 0.001065177 | 0.0007838263 | 0.001536813 | 0.001428499 | 0.001459729 | 0.001318041 | 0.001539254 | 0.001487034 | 0.0008993352 | 0.001305283 | 0.0007838263 | 0.0008993352 |
6860.783810906739 | 0.001371135 | 0.0005600131 | 0.0003442246 | 0.001248859 | 0.0001193351 | 0.001443179 | 0.001206113 | 0.0007924783 | 0.001218448 | 0.001331777 | 0.001076586 | 0.0007924783 | 0.001552245 | 0.001443179 | 0.001474663 | 0.001331777 | 0.001554632 | 0.001502182 | 0.0009091354 | 0.001317954 | 0.0007924783 | 0.0009091354 |
6826.749189768143 | 0.001385349 | 0.0005663762 | 0.0003482838 | 0.001261984 | 0.0001208443 | 0.001457999 | 0.001218848 | 0.0008012205 | 0.001230209 | 0.001345645 | 0.0010881 | 0.0008012205 | 0.001567817 | 0.001457999 | 0.001489738 | 0.001345645 | 0.001570157 | 0.001517467 | 0.0009190341 | 0.001330739 | 0.0008012205 | 0.0009190341 |
6792.912078041346 | 0.001399691 | 0.0005728063 | 0.0003523861 | 0.001275227 | 0.0001223712 | 0.001472964 | 0.001231698 | 0.0008100494 | 0.001242074 | 0.001359637 | 0.001099735 | 0.0008100494 | 0.001583532 | 0.001472964 | 0.001504955 | 0.001359637 | 0.001585819 | 0.001532889 | 0.00092903 | 0.001343638 | 0.0008100494 | 0.00092903 |
6759.275441939025 | 0.001414174 | 0.0005793018 | 0.0003565331 | 0.001288603 | 0.0001239155 | 0.001488063 | 0.001244679 | 0.000818966 | 0.001254042 | 0.001373769 | 0.001111473 | 0.000818966 | 0.001599389 | 0.001488063 | 0.001520306 | 0.001373769 | 0.001601627 | 0.001548458 | 0.0009391251 | 0.001356646 | 0.000818966 | 0.0009391251 |
6725.842289260134 | 0.001428784 | 0.0005858628 | 0.0003607231 | 0.001302097 | 0.0001254773 | 0.001503292 | 0.001257774 | 0.0008279711 | 0.00126611 | 0.001388032 | 0.001123323 | 0.0008279711 | 0.001615387 | 0.001503292 | 0.001535803 | 0.001388032 | 0.001617571 | 0.001564163 | 0.0009493169 | 0.001369766 | 0.0008279711 | 0.0009493169 |
6692.615669527125 | 0.001443527 | 0.0005924889 | 0.000364956 | 0.001315724 | 0.0001270567 | 0.001518666 | 0.00127099 | 0.0008370621 | 0.001278279 | 0.001402418 | 0.001135284 | 0.0008370621 | 0.001631522 | 0.001518666 | 0.001551432 | 0.001402418 | 0.00163365 | 0.001580013 | 0.0009596061 | 0.001382997 | 0.0008370621 | 0.0009596061 |
6659.598674094408 | 0.001458402 | 0.0005991799 | 0.0003692319 | 0.001329466 | 0.0001286535 | 0.001534175 | 0.001284326 | 0.0008462387 | 0.001290548 | 0.001416933 | 0.001147356 | 0.0008462387 | 0.001647796 | 0.001534175 | 0.0015672 | 0.001416933 | 0.001649871 | 0.001595997 | 0.0009699912 | 0.001396336 | 0.0008462387 | 0.0009699912 |
6626.7944362263115 | 0.001473408 | 0.0006059357 | 0.0003735522 | 0.001343331 | 0.0001302674 | 0.001549814 | 0.001297783 | 0.0008555016 | 0.001302917 | 0.001431577 | 0.001159538 | 0.0008555016 | 0.001664207 | 0.001549814 | 0.001583098 | 0.001431577 | 0.001666225 | 0.001612113 | 0.0009804726 | 0.001409785 | 0.0008555016 | 0.0009804726 |
6594.206131142701 | 0.001488537 | 0.0006127558 | 0.0003779133 | 0.001357318 | 0.0001318989 | 0.001565586 | 0.001311349 | 0.0008648502 | 0.001315382 | 0.001446348 | 0.001171829 | 0.0008648502 | 0.001680751 | 0.001565586 | 0.001599138 | 0.001446348 | 0.001682709 | 0.001628366 | 0.0009910476 | 0.001423337 | 0.0008648502 | 0.0009910476 |
windrose
direction | windrose_weight |
---|---|
N | 1.0 |
NE | 1.0 |
E | 1.0 |
SE | 10.0 |
S | 1.0 |
SW | 1.0 |
W | 1.0 |
NW | 1.0 |