I am using MonetDB (MDB) for OLAP queries. I am storing source data in PostgreSQL (PGSQL) and syncing it with MonetDB in batches written in Python.
In PGSQL there is a wide table with ID (non-unique) and few columns. Every few seconds Python script takes a batch of 10k records changed in the PGSQL and uploads them to MDB.
The process of upload to MDB is as follows:
- Create staging table in MDB
- Use
COPY
command to upload 10k records into the staging table. DELETE
from destination table all IDs that are in staging table.INSERT
to the destination table all rows from staging table.
So, it is basically a DELETE & INSERT
. I cannot use MERGE
statement, because I do not have a PK - one ID can have multiple values in the destination. So I need to do a delete and full insert for all IDs currently synced.
Now to the problem: the DELETE
is slow.
When I do a DELETE
on a destination table, deleting 10k records in table of 25M rows, it will take 500ms.
However! If I run simple SELECT * FROM destination WHERE id = 1
and THEN do a DELETE
, it takes 2ms.
I think that it has something to do with automatic creation of auxiliary indices. But this is where my knowledge ends.
I tried to solve this problem of "pre-heating" by doing the lookup myself and it works - but only for the first DELETE
after pre-heat.
Once I do DELETE
and INSERT
, the next DELETE
gets again slow. And doing the pre-heating before each DELETE
does not make sense, because the pre-heat itself takes 500ms.
Is there any way on how to sync data to MDB without breaking auxiliary indices already built? Or make the DELETE
faster without pre-heat? Or should I use some different technique to sync data into MDB without PK (does MERGE
has the same problem?).
Thanks!