I have many restrictions on where I can run scripts, access sources and targets, and have enough space to prepare the data for the task.
I get my zipped database dump from the hosting provider.
I split the unzipped commands so INSERT INTO
lines get put into one file, and all the others go into a second one.
Then I create the database structures from the second one.
I convert the INSERT INTO
statements to table related CSV files.
Finally, I upload the csv files in parallel (up to 50 tables concurrently) and this way a 130GB text file dump is cloned in 3 hours, instead of the 17 hours it'd take when using the statement by statement method.
In the 3 hours, I include:
- the copy over (10 minutes),
- sanity check (10 minutes) and
- filtering of logs (10 minutes), as the log entries need to be from the latest academic year only.
The remote zipped file is between 7GB to 13GB passed over a 40MBps line.
The upload is to a remote server via a 40MBps line.
If your mysql server is local, the speed of uploading can be faster.
I utilise scp, gzip, zgrep, sed, awk, ps, mysqlimport, mysql and some other utilities to speed up decompression and filtering (pv, rg, pigz) if available.
If I had direct access to the database server, an LVM with folder level snapshot abilities would be the preferred solution, giving you speeds restricted only by the copy speed of the media.