I have three tables:
grade (grade_id, grade_value, grade_date) ~100M rows
grade_archive (grade_id, grade_value, grade_date) 0 rows
peer_review (grade_id, peer_review_value, peer_review_date) ~10M rows
I want to move all rows from table grade
to grade_archive
that are older than a month and are not in table peer_review
.
The tables are actively used so any insert would have to be low priority to avoid interrupting existing and new processes while this runs.
Expected table rows should look something like this when done:
grade ~10M rows
grade_archive ~90M rows
peer_review ~10M rows
I imagine it's something close to:
INSERT
LOW_PRIORITY
INTO grade_archive
(grade_id,grade_value,grade_date)
SELECT
grade_id,grade_value,grade_date
FROM
grade
WHERE
grade_date < DATE_ADD(NOW(), INTERVAL -1 MONTH)
AND grade_id NOT IN
(
SELECT grade_id FROM peer_review
);
And then clean up the grade
table by removing all rows in the archive table:
DELETE LOW_PRIORITY FROM grade WHERE grade_id IN (SELECT grade_id FROM grade_archive);
But these subselects are very slow with large tables and I'm nervous about the result. Looking for some better direction.