To establish precident I reviewed the following:
- Amazon-RDS database to Dedicated server
- Mysqldump of the AWS RDS
I have not however found a clear cut solution to my issues in these, and only ancillary factoids throughout - I wanted to put a comprehensive specific guide to moving data into/around AWS RDS.
I did have a discussion at the Percona MySQL performance conference with a DBA consultant who works with RDS, he suggested the following, and this is apparently an issue frequently - I wanted additional input in the effort to help everyone.
** This is important for RDS users according to a Large MySQL Consulting firm and the number of hands raised at the talk.**
- Goal #1. Get the data out of your RDS Instance -You can't move between major releases within RDS. -Maybe you want to take it to EC2 instance like Percona, etc.
- Wish List Goal # 1a. Secondarily, Clean/QA the data up if possible,
- Goal #3 - Get the Data back into another smaller RDS, or other flavor/platform.
The facts:
- I have a large (40-50GB dataset in utf-8 primarily, some latin-1) in MySQL 5.5.x (<5.5.8). 1a. The old admin for whatever reason set our instance size to 1000GB of dedicated storage that is just padding. (?).
- All tables are now, obviously InnoDB - a couple tables are approx ~14GB. That's another issue.
- Multi encode may exist: perhaps UTF8; Latin-1 and Swedish-fkafkakfk?
So with that these are options I am debating to proceed with:
Option A: Straight .sql dump; .sql reload
- Dump, in .sql standard executables the schema and tables etc en masse, reload.
- **So was thininking latin-1 for portability:
mysqldump -u username -p --default-character-set=latin1 -N database > backup.sql
mysql -u username -p --default-character-set=latin1 database < backup.sql**
Question RE: Option A: - Suggestions re: above code, for chunking, integrity and otherwise guaranteeing a smooth dump and reload? Contingencies for
show information schema
with variable encodes (Algorithm for handling what latin1 can't?)
Option B: Individual ascii file dumps of tables w/Schema + QA/Schema refinement
Dump, in straight ASCII (Charset? UTF-8? I must be careful?) these data into, separate respective tables, perhaps in chunks for data QA.
Would proceed with the following to output TSV DATA and SCHEMA:
mysqldump --user=dbuser --password --tab=~/output/dir dbname
Secondarily, run through some perl/python to clean up possible errant junk chars; encoding issues; etc from 8 years of 5 different DBAs and about 12 different data input format/file types.
Questions RE: Option B:
- My data has a lot of garbage chars that are true to the data; Pipe delim the best?
- I've had horrendous errors loading into AWS RDS from TSV etc from basic dumps, suggestions beyond what is published in their data loading whitepaper?