RDS MySQL Data to a Flat File to Reload in another MySQL - Std CLI Not Working

Question

To establish precident I reviewed the following:

Amazon-RDS database to Dedicated server
Mysqldump of the AWS RDS
I have not however found a clear cut solution to my issues in these, and only ancillary factoids throughout - I wanted to put a comprehensive specific guide to moving data into/around AWS RDS.

I did have a discussion at the Percona MySQL performance conference with a DBA consultant who works with RDS, he suggested the following, and this is apparently an issue frequently - I wanted additional input in the effort to help everyone.

** This is important for RDS users according to a Large MySQL Consulting firm and the number of hands raised at the talk.**

Goal #1. Get the data out of your RDS Instance -You can't move between major releases within RDS. -Maybe you want to take it to EC2 instance like Percona, etc.
Wish List Goal # 1a. Secondarily, Clean/QA the data up if possible,
Goal #3 - Get the Data back into another smaller RDS, or other flavor/platform.

The facts:

I have a large (40-50GB dataset in utf-8 primarily, some latin-1) in MySQL 5.5.x (<5.5.8). 1a. The old admin for whatever reason set our instance size to 1000GB of dedicated storage that is just padding. (?).
All tables are now, obviously InnoDB - a couple tables are approx ~14GB. That's another issue.
Multi encode may exist: perhaps UTF8; Latin-1 and Swedish-fkafkakfk?

So with that these are options I am debating to proceed with:

Option A: Straight .sql dump; .sql reload

Dump, in .sql standard executables the schema and tables etc en masse, reload.
**So was thininking latin-1 for portability: mysqldump -u username -p --default-character-set=latin1 -N database > backup.sql mysql -u username -p --default-character-set=latin1 database < backup.sql**

Question RE: Option A: - Suggestions re: above code, for chunking, integrity and otherwise guaranteeing a smooth dump and reload? Contingencies for show information schema with variable encodes (Algorithm for handling what latin1 can't?)

Option B: Individual ascii file dumps of tables w/Schema + QA/Schema refinement

Dump, in straight ASCII (Charset? UTF-8? I must be careful?) these data into, separate respective tables, perhaps in chunks for data QA.

Would proceed with the following to output TSV DATA and SCHEMA:

mysqldump --user=dbuser --password --tab=~/output/dir dbname
Secondarily, run through some perl/python to clean up possible errant junk chars; encoding issues; etc from 8 years of 5 different DBAs and about 12 different data input format/file types.

Questions RE: Option B:

My data has a lot of garbage chars that are true to the data; Pipe delim the best?

I've had horrendous errors loading into AWS RDS from TSV etc from basic dumps, suggestions beyond what is published in their data loading whitepaper?

score 0 · Answer 1 · answered Jun 19 '15 at 19:00

I would strongly suggest Option C, which is to use an ETL tool to move the data. I have used the Community edition of Pentaho Data Integration very successfully to do the sort of things you're asking about (especially the cleanup). It's more work than a simple dump and load, but you have very good control and can do things like sync up again later and so on.

RDS MySQL Data to a Flat File to Reload in another MySQL - Std CLI Not Working

1 Answers1