0

I have written a script to restore a huge mysqldump file in parallel

https://gist.github.com/arunp0/4c34caee2432a6f11b91b99dfd2c5ef3

Is it okay split the dump file in parts and restore in parallel?

Or If there is any suggestions for improvements to reduce time in restoring

Just to Explain How the script works

sed -n -e '/DROP TABLE/,/--/p'  "${restoreFile}" > "${tableMetaFile}"

Creates new sql file with DROP TABLE , CREATE TABLE Commands

then this file is restored (Not in parallel) - All the tables are created before restoring the data

  # Creates new file with data only for one table (TableName)
  grep -E "^INSERT INTO \`${TableName}\`" "${restoreFile}"  > "tmp/${TableName}.sql"
  # that file is split into as many chunks as the number of cpus 
  split -n l/$(nproc) --additional-suffix=.sql "tmp/${TableName}.sql"  tmp/"${TableName}"/output.
 # This comamnd is used to restore the file
 pv -f tmp/meta/pre.sql "${item}" tmp/meta/post.sql | mysql --protocol=TCP --port "$PORT" --host "$HOST" -u "${USER}" -p"${PASSWORD}" -f "${DATABASE}"  &
arunp9294
  • 767
  • 5
  • 15
  • *Is it okay split the dump file in parts and restore in parallel?* In general - no. There is a lot of steps relations in dump file which relative position must be stored strongly - the next step cannot be started until the previous one is fully finished. – Akina Feb 14 '20 at 09:24
  • Even if we disable keys, and restoring from scratch and we can trust the restore file? – arunp9294 Feb 14 '20 at 09:29
  • Imagine that you begin to insert whereas the table is not created yet in another thread. Imagine that you restore foreign key or trigger before insertion finished. – Akina Feb 14 '20 at 09:33
  • I have disabled foreign keys checks , and table insertion is done before restoring data. – arunp9294 Feb 14 '20 at 10:44
  • 1
    This approach works (and is used by other tools). Without checking your script, some concerns/ideas: a) you (hopefully) dump more often than you import, so doing parallel dumps (which then generate parallel export files) might be another way to go b) you may need more logic generating your meta data file (e.g. triggers need to be added afterwards) c) check for multi-database exports (as I assume they break your script) d) I didn't check your import order, but since you split table by number of cpu: it might be more efficient to import different tables in parallel than the same. – Solarflare Feb 14 '20 at 12:27

0 Answers0