I am trying to backup some data from a table with > 50M rows. The table looks like this
#
entry_id : int unsignedauto_increment # unique entry id
---
time=CURRENT_TIMESTAMP : timestamp # current timestamp
room : varchar(255) # Room
id : varchar(255) # Sensor ID / NAME
value : double # sensor value
My plan is to
- Fetch a limited amount of KEYS (i.e.
fetch('KEY', limit=some_large_number)
) - Retrieve table entries for those KEYS
- Save data chunk on disk
- Delete table entries belonging to KEYS
- Start over ...
Just fetching, let's say 1_000_000 entries, is reasonably fast (a few seconds), however if I want to go through retrieving primary keys -> fetching data -> deleting those entries - I am basically stuck at fetching the data (>20 minutes for ~ 100_000 entries or timeout(?)).
What is the most elegant / time efficient way to perform this task using datajoint logic?
(Datajoint python version '0.13.2'
)