6

To establish precident I reviewed the following:

  • Amazon-RDS database to Dedicated server
  • Mysqldump of the AWS RDS
  • I have not however found a clear cut solution to my issues in these, and only ancillary factoids throughout - I wanted to put a comprehensive specific guide to moving data into/around AWS RDS.

    I did have a discussion at the Percona MySQL performance conference with a DBA consultant who works with RDS, he suggested the following, and this is apparently an issue frequently - I wanted additional input in the effort to help everyone.

** This is important for RDS users according to a Large MySQL Consulting firm and the number of hands raised at the talk.**

  • Goal #1. Get the data out of your RDS Instance -You can't move between major releases within RDS. -Maybe you want to take it to EC2 instance like Percona, etc.
  • Wish List Goal # 1a. Secondarily, Clean/QA the data up if possible,
  • Goal #3 - Get the Data back into another smaller RDS, or other flavor/platform.

The facts:

  1. I have a large (40-50GB dataset in utf-8 primarily, some latin-1) in MySQL 5.5.x (<5.5.8). 1a. The old admin for whatever reason set our instance size to 1000GB of dedicated storage that is just padding. (?).
  2. All tables are now, obviously InnoDB - a couple tables are approx ~14GB. That's another issue.
  3. Multi encode may exist: perhaps UTF8; Latin-1 and Swedish-fkafkakfk?

So with that these are options I am debating to proceed with:

Option A: Straight .sql dump; .sql reload

  1. Dump, in .sql standard executables the schema and tables etc en masse, reload.
  2. **So was thininking latin-1 for portability: mysqldump -u username -p --default-character-set=latin1 -N database > backup.sql mysql -u username -p --default-character-set=latin1 database < backup.sql**

Question RE: Option A: - Suggestions re: above code, for chunking, integrity and otherwise guaranteeing a smooth dump and reload? Contingencies for show information schema with variable encodes (Algorithm for handling what latin1 can't?)

Option B: Individual ascii file dumps of tables w/Schema + QA/Schema refinement

  1. Dump, in straight ASCII (Charset? UTF-8? I must be careful?) these data into, separate respective tables, perhaps in chunks for data QA.

    Would proceed with the following to output TSV DATA and SCHEMA:

    mysqldump --user=dbuser --password --tab=~/output/dir dbname

  2. Secondarily, run through some perl/python to clean up possible errant junk chars; encoding issues; etc from 8 years of 5 different DBAs and about 12 different data input format/file types.

Questions RE: Option B:

  • My data has a lot of garbage chars that are true to the data; Pipe delim the best?
  • I've had horrendous errors loading into AWS RDS from TSV etc from basic dumps, suggestions beyond what is published in their data loading whitepaper?
hjpotter92
  • 78,589
  • 36
  • 144
  • 183
cerd
  • 2,171
  • 1
  • 18
  • 28

1 Answers1

0

I would strongly suggest Option C, which is to use an ETL tool to move the data. I have used the Community edition of Pentaho Data Integration very successfully to do the sort of things you're asking about (especially the cleanup). It's more work than a simple dump and load, but you have very good control and can do things like sync up again later and so on.

SeanN
  • 633
  • 8
  • 11