I'm trying to port a database from MySQL to PostgreSQL. I've rebuilt the schema in Postgres, so all I need to do is get the data across, without recreating the tables.
I could do this with code that iterates all the records and inserts them one at a time, but I tried that and it's waaayyyy to slow for our database size, so I'm trying to use mysqldump and a pipe into psql instead (once per table, which I may parallelize once I get it working).
I've had to jump through various hoops to get this far, turning on and off various flags to get a dump that is vaguely sane. Again, this only dumps the INSERT INTO, since I've already prepared the empty schema to get the data into:
/usr/bin/env \
PGPASSWORD=mypassword \
mysqldump \
-h mysql-server \
-u mysql-username \
--password=mysql-password \
mysql-database-name \
table-name \
--compatible=postgresql \
--compact \
-e -c -t \
--default-character-set=utf8 \
| sed "s/\\\\\\'/\\'\\'/g" \
| psql \
-h postgresql-server \
--username=postgresql-username \
postgresql-database-name
Everything except that ugly sed
command is manageable. I'm doing that sed
to try and convert MySQL's approach to quoting single-quotes inside of strings ('O\'Connor'
) o PostgreSQL's quoting requirements ('O''Connor'
). It works, until there are strings like this in the dump: 'String ending with a backslash \\'
... and yes, it seems there is some user input in our database that has this format, which is perfectly valid, but doesn't pass my sed
command. I could add a lookbehind to the sed
command, but I feel like I'm crawling into a rabbit hole. Is there a way to either:
a) Tell mysqldump to quote single quotes by doubling them up b) Tell psql to expect backslashes to be interpreted as quoting escapes?
I have another issue with BINARY
and bytea
differences, but I've worked around that with a base64 encoding/decoding phase.
EDIT | Looks like I can do (b) with set backslash_quote = on; set standard_conforming_strings = off;
, though I'm not sure how to inject that into the start of the piped output.