0

I have an in-memory SQLite database that is being created dynamically based on user input using sqlite3 in Python3.9. I have one update statement that is an issue in terms of speed. I have simplified the case to only include the SQLite and not Python.

TABLES

The table to be updated is called West100_1 which contains two id columns: id and pseudo_id. pseudo_id is NULL in most places, and there are a maximum of 4 values for pseudo_id where the values for id are all the same. All other columns are in the configureation of {taxon]_PL, which in this case are 21 columns. This table can get quite big, upwards of a million rows.

the other tables are:

  • 100_1 a map of points with the same ids as in West100_1, so can also get quite big. Contains columns of the size (resolution) of each point which has two possible values, and the associated community veg_com (and a number of other columns such as geometry that aren't relevant here).
  • sampling_sites which contains the name and point geometry (x and y) of sampling sites. Small table, in this case 2 rows
  • vegcom which contains the names of the vegetation communities and the percentage cover of a taxon in that community. Small table, usually no more than 1000 rows.
  • dist_dir contains a lookup table of the distance and direction from sampling sites for each id. Same number of rows as {map_name}.
  • pseudo_points contains the same lookup table but for the pseudo points. Very small, usually no more than 40 rows.
  • DistanceLookup a lookup table containing per plant taxon a weighting value for each possible distance. Fairly big, usually about a quarter of the number of rows of the main table.
  • windrose a small lookup table with two colums: cardinal direction and weighting. Small, always 8 rows.

QUERY

THe query is essentially a mathmatical formula for which the inputs around taken from the various tables. The query needs to be run multiple times, once for each entry in sampling_sites (example here being "West") * each entry in taxa (example here being Cyperaceae). In this case that means 42 times. The query below is ONE of these queries, but they are otherwise all the same in construction.

UPDATE West100_1 SET "Cyperaceae_PL" =
    (0.87 * 
        (SELECT "vegcom_percent" FROM vegcom WHERE "taxon_code" = "Cyperaceae" AND "veg_com" = 
            (SELECT "veg_com" FROM "100_1" WHERE (id = West100_1.id)
            )
        )
        * 
        (CASE WHEN West100_1.pseudo_id NOT NULL THEN 
            (SELECT "Cyperaceae_DW" FROM DistanceLookup WHERE DistanceLookup.distance = 
                (SELECT "distance" FROM pseudo_points WHERE (pseudo_id = West100_1.pseudo_id)
                )
            )
        ELSE 
            (SELECT "Cyperaceae_DW" FROM DistanceLookup WHERE DistanceLookup.distance = 
                (SELECT "distance" FROM dist_dir WHERE (id = West100_1.id) AND (site_name = "West")
                )
            ) 
        END)
    * 
        (SELECT "windrose_weight" FROM windrose WHERE 
            (CASE WHEN West100_1.pseudo_id is NULL THEN 
                ("direction" = 
                    (SELECT "direction" FROM dist_dir WHERE ((id = West100_1.id) AND (site_name = "West"))
                    )
                ) 
            ELSE 
                ("direction" = 
                    (SELECT "direction" FROM pseudo_points WHERE (pseudo_id = West100_1.pseudo_id)
                    )
                ) 
            END)
        )
    )

I have applied indices wherever I could think to add them, and that results in this query plan. Unfortunately to avoid an error in the overarching programme earlier on I have to copy all the tables which makes using keys impossible as (I think) there exists no way to copy tables with keys intact.

QUERY PLAN

 [(4, 0, 0, 'SCAN West100_1'),
 (21, 0, 0, 'CORRELATED SCALAR SUBQUERY 2'),
 (26, 21, 0, 'SEARCH vegcom USING INDEX sqlite_autoindex_vegcom_1 (veg_com=? AND taxon_code=?)'),
 (28, 21, 0, 'CORRELATED SCALAR SUBQUERY 1'),
 (33, 28, 0, 'SEARCH 100_1 USING INDEX 100_1_idx (id=?)'),
 (57, 0, 0, 'CORRELATED SCALAR SUBQUERY 4'),
 (62, 57, 0, 'SEARCH DistanceLookup USING INDEX sqlite_autoindex_DistanceLookup_1 (distance=?)'),
 (64, 57, 0, 'CORRELATED SCALAR SUBQUERY 3'),
 (69, 64, 0, 'SEARCH pseudo_points USING INDEX sqlite_autoindex_pseudo_points_1 (pseudo_id=?)'),
 (92, 0, 0, 'CORRELATED SCALAR SUBQUERY 6'),
 (97, 92, 0, 'SEARCH DistanceLookup USING INDEX sqlite_autoindex_DistanceLookup_1 (distance=?)'),
 (99, 92, 0, 'CORRELATED SCALAR SUBQUERY 5'),
 (104, 99, 0, 'SEARCH dist_dir USING INDEX dist_dir_id_name_idx (id=? AND site_name=?)'),
 (128, 0, 0, 'CORRELATED SCALAR SUBQUERY 9'),
 (132, 128, 0, 'SCAN windrose'),
 (138, 128, 0, 'CORRELATED SCALAR SUBQUERY 7'),
 (143, 138, 0, 'SEARCH dist_dir USING INDEX dist_dir_id_name_idx (id=? AND site_name=?)'),
 (160, 128, 0, 'CORRELATED SCALAR SUBQUERY 8'),
 (165, 160, 0, 'SEARCH pseudo_points USING INDEX sqlite_autoindex_pseudo_points_1 (pseudo_id=?)')]

I am not very proficient at reading the query plans, is there any index that I am missing?

In an example where West100_1 has 284903 points, the combined 41 queries take ~60 seconds to run, so about 1.5 seconds per query. In the further code this needs to be done many times (more than 1000 times), causing the code to run for about a day so I would really like to shave off time any way I can.

table examples (top 10 rows):

West100_1 (before filling)

msa_id pseudo_id Alnus_PL Apiaceae_PL Ast_lactuceae_PL Betula_PL Cerealia_PL Chenopodiaceae_PL Corylus_PL Cyperaceae_PL Filipendula_PL Fraxinus_PL Plantago_PL Quercus_PL Ranunculaceae_PL Rubiaceae_PL Rumex_PL Salix_PL Sambucus_PL Solidago_PL Tilia_PL Urtica_PL Poaceae_PL Ulmus_PL
1 [NULL]
2 [NULL]
3 [NULL]
4 [NULL]
5 [NULL]
6 [NULL]
7 [NULL]
8 [NULL]
9 [NULL]
10 [NULL]

100_1

msa_id geom_X geom_Y veg_com chance_to_happen resolution lidar_su1 50m_basin 2m_basin 5m_basin 10m_basin ESB_DESC
53607 456224.3438 412147.5 Alnus Carr 0 1 4.178999901 PEAT
53606 456223.3438 412147.5 Alnus Carr 0 1 4.179999828
53609 456226.3438 412147.5 Alnus Carr 0 1 4.209000111 PEAT
53608 456225.3438 412147.5 Alnus Carr 0 1 4.196000099 PEAT
53603 456220.3438 412147.5 Alnus Carr 0 1 4.175000191 PEAT
53602 456219.3438 412147.5 Alnus Carr 0 1 4.169000149 PEAT
53605 456222.3438 412147.5 Alnus Carr 0 1 4.188000202 PEAT
53604 456221.3438 412147.5 Phramitetalia_Fen 0 1 4.191999912 PEAT
53599 456216.3438 412147.5 Alnus Carr 0 1 4.164000034 PEAT
53598 456215.3438 412147.5 Alnus Carr 0 1 4.168000221 PEAT

sampling sites

site_name sample_x sample_y sample_is_lake snapped_x snapped_y msa_id
West 456328.7 412024.2 no 456328.3438 412048.5 88361
East 456811.0 412150.0 no 456811.3438 412174.5 219294

vegcom

veg_com taxon_code vegcom_percent
Quercus_woodland Alnus 0.0
Quercus_woodland Betula 1.0
Quercus_woodland Cyperaceae 0.0
Quercus_woodland Filipendula 0.0
Quercus_woodland Quercus 68.0
Quercus_woodland Poaceae 5.0
Quercus_woodland Rubiaceae 0.0
Quercus_woodland Salix 0.0
Quercus_woodland Sambucus 0.0
Quercus_woodland Urtica 0.0

dist_dir

msa_id site_name geom_X geom_Y distance direction
1 West 451594.8438 417062.0 6895.013016666466 SE
2 West 451644.8438 417062.0 6860.783810906739 SE
3 West 451694.8438 417062.0 6826.749189768143 SE
4 West 451744.8438 417062.0 6792.912078041346 SE
5 West 451794.8438 417062.0 6759.275441939025 SE
6 West 451844.8438 417062.0 6725.842289260134 SE
7 West 451894.8438 417062.0 6692.615669527125 SE
8 West 451944.8438 417062.0 6659.598674094408 SE
9 West 451994.8438 417062.0 6626.7944362263115 SE
10 West 452044.8438 417062.0 6594.206131142701 SE

pseudo_points

pseudo_id site_name msa_id direction distance geom_X geom_Y
0 West 88361 NE 17.67766952966369 456353.3438 412073.5
1 West 88361 SE 17.67766952966369 456353.3438 412023.5
2 West 88361 SW 17.67766952966369 456303.3438 412023.5
3 West 88361 NW 17.67766952966369 456303.3438 412073.5
4 East 219294 NE 17.67766952966369 456836.3438 412199.5
5 East 219294 SE 17.67766952966369 456836.3438 412149.5
6 East 219294 SW 17.67766952966369 456786.3438 412149.5
7 East 219294 NW 17.67766952966369 456786.3438 412199.5

DistanceLookup

distance Alnus_DW Apiaceae_DW Ast_lactuceae_DW Betula_DW Cerealia_DW Chenopodiaceae_DW Corylus_DW Cyperaceae_DW Filipendula_DW Fraxinus_DW Plantago_DW Quercus_DW Ranunculaceae_DW Rubiaceae_DW Rumex_DW Salix_DW Sambucus_DW Solidago_DW Tilia_DW Urtica_DW Poaceae_DW Ulmus_DW
6895.013016666466 0.001357056 0.0005537155 0.0003402101 0.001235861 0.0001178434 0.001428499 0.001193509 0.0007838263 0.001206789 0.001318041 0.001065177 0.0007838263 0.001536813 0.001428499 0.001459729 0.001318041 0.001539254 0.001487034 0.0008993352 0.001305283 0.0007838263 0.0008993352
6860.783810906739 0.001371135 0.0005600131 0.0003442246 0.001248859 0.0001193351 0.001443179 0.001206113 0.0007924783 0.001218448 0.001331777 0.001076586 0.0007924783 0.001552245 0.001443179 0.001474663 0.001331777 0.001554632 0.001502182 0.0009091354 0.001317954 0.0007924783 0.0009091354
6826.749189768143 0.001385349 0.0005663762 0.0003482838 0.001261984 0.0001208443 0.001457999 0.001218848 0.0008012205 0.001230209 0.001345645 0.0010881 0.0008012205 0.001567817 0.001457999 0.001489738 0.001345645 0.001570157 0.001517467 0.0009190341 0.001330739 0.0008012205 0.0009190341
6792.912078041346 0.001399691 0.0005728063 0.0003523861 0.001275227 0.0001223712 0.001472964 0.001231698 0.0008100494 0.001242074 0.001359637 0.001099735 0.0008100494 0.001583532 0.001472964 0.001504955 0.001359637 0.001585819 0.001532889 0.00092903 0.001343638 0.0008100494 0.00092903
6759.275441939025 0.001414174 0.0005793018 0.0003565331 0.001288603 0.0001239155 0.001488063 0.001244679 0.000818966 0.001254042 0.001373769 0.001111473 0.000818966 0.001599389 0.001488063 0.001520306 0.001373769 0.001601627 0.001548458 0.0009391251 0.001356646 0.000818966 0.0009391251
6725.842289260134 0.001428784 0.0005858628 0.0003607231 0.001302097 0.0001254773 0.001503292 0.001257774 0.0008279711 0.00126611 0.001388032 0.001123323 0.0008279711 0.001615387 0.001503292 0.001535803 0.001388032 0.001617571 0.001564163 0.0009493169 0.001369766 0.0008279711 0.0009493169
6692.615669527125 0.001443527 0.0005924889 0.000364956 0.001315724 0.0001270567 0.001518666 0.00127099 0.0008370621 0.001278279 0.001402418 0.001135284 0.0008370621 0.001631522 0.001518666 0.001551432 0.001402418 0.00163365 0.001580013 0.0009596061 0.001382997 0.0008370621 0.0009596061
6659.598674094408 0.001458402 0.0005991799 0.0003692319 0.001329466 0.0001286535 0.001534175 0.001284326 0.0008462387 0.001290548 0.001416933 0.001147356 0.0008462387 0.001647796 0.001534175 0.0015672 0.001416933 0.001649871 0.001595997 0.0009699912 0.001396336 0.0008462387 0.0009699912
6626.7944362263115 0.001473408 0.0006059357 0.0003735522 0.001343331 0.0001302674 0.001549814 0.001297783 0.0008555016 0.001302917 0.001431577 0.001159538 0.0008555016 0.001664207 0.001549814 0.001583098 0.001431577 0.001666225 0.001612113 0.0009804726 0.001409785 0.0008555016 0.0009804726
6594.206131142701 0.001488537 0.0006127558 0.0003779133 0.001357318 0.0001318989 0.001565586 0.001311349 0.0008648502 0.001315382 0.001446348 0.001171829 0.0008648502 0.001680751 0.001565586 0.001599138 0.001446348 0.001682709 0.001628366 0.0009910476 0.001423337 0.0008648502 0.0009910476

windrose

direction windrose_weight
N 1.0
NE 1.0
E 1.0
SE 10.0
S 1.0
SW 1.0
W 1.0
NW 1.0

0 Answers0