I'm trying to move a large number of records from one MySQL instance two another inside RDS. They are on different VPCs and different AWS accounts, so I can't create a data pipeline that would do the copy for me.
I've written a quick java program that connects to both the import database and the export database and does the following:
- query the import database for the highest in table.primary_key with
SELECT MAX(primary_key) FROM table
- get a result set from the export table with
SELECT * FROM table WHERE(primary_key > max_from_import) LIMIT 1000000
- create a PreparedStatement object from the import connection and set the queryString to
INSERT INTO table (col1....coln) VALUES (?....n?)
- iterate over the result set and set the prepared statement columns to the ones from the result cursor (with some minor manipulations to the data), call execute on the PreparedStatement object, clear its' parameters, then move to the next result.
With this method I'm able to see around 100000 records being imported an hour, but I know that from this question that a way to optimize inserts is not to create a new query each time, but to append more data with each insert. i.e.
INSERT INTO table (col1...coln) VALUES (val1...valn), (val1...valn)....(val1...valn);
Does the jdbc driver know to do this, or is there some sort of optimization I can make on my end to improve insert run time?
UPDATE: Both answers recommended using the add and execute batch, as well as removing auto commit. Removing auto commit saw a slight improvement (10%), doing the batch yielded a run time of less than 50% of the individual inserts.