0

I'm trying to move a large number of records from one MySQL instance two another inside RDS. They are on different VPCs and different AWS accounts, so I can't create a data pipeline that would do the copy for me.

I've written a quick java program that connects to both the import database and the export database and does the following:

  1. query the import database for the highest in table.primary_key with SELECT MAX(primary_key) FROM table
  2. get a result set from the export table with SELECT * FROM table WHERE(primary_key > max_from_import) LIMIT 1000000
  3. create a PreparedStatement object from the import connection and set the queryString to INSERT INTO table (col1....coln) VALUES (?....n?)
  4. iterate over the result set and set the prepared statement columns to the ones from the result cursor (with some minor manipulations to the data), call execute on the PreparedStatement object, clear its' parameters, then move to the next result.

With this method I'm able to see around 100000 records being imported an hour, but I know that from this question that a way to optimize inserts is not to create a new query each time, but to append more data with each insert. i.e.

INSERT INTO table (col1...coln) VALUES (val1...valn), (val1...valn)....(val1...valn);

Does the jdbc driver know to do this, or is there some sort of optimization I can make on my end to improve insert run time?

UPDATE: Both answers recommended using the add and execute batch, as well as removing auto commit. Removing auto commit saw a slight improvement (10%), doing the batch yielded a run time of less than 50% of the individual inserts.

CBredlow
  • 2,790
  • 2
  • 28
  • 47
  • Unless you are using MyISAM tables, you will see enormous speed gain if you wrap your insert statements inside transactions, turn autocommit off and e.g. issue a commit every 100 or every 1000 insert. The exact detail illudes me right now, but pertaining to your question is the `rewriteBatchedStatements` connection option documented here https://dev.mysql.com/doc/connector-j/5.1/en/connector-j-reference-configuration-properties.html – nos Jun 14 '17 at 17:17
  • Perhaps `Statement#executeBatch()` would help. – Andrew S Jun 14 '17 at 17:22
  • @CBredlow - are you able to speed up insert using my method? – Fathah Rehman P Jun 14 '17 at 17:34
  • @FathahRehmanP writing it as we speak, I was mostly curious if the driver did the bulk itself, but the fact that you guys have all mentioned to turn autocommit off leads me to believe that that wasn't the case. – CBredlow Jun 14 '17 at 17:36
  • @CBredlow - if you do bulk insert with auto commit off you can insert really fast. If you try to insert as in my answer, you will get huge benefit in insert speed. – Fathah Rehman P Jun 14 '17 at 17:40

2 Answers2

1

First create a JDBC connection to Destination database and make its auto commit property to false.

After that in a loop do the following

Read N(for example 1000) number of rows from Source database and write that to destination database.

After some inserts commit destination database connection.

Sample code to get more idea is given below

Connection sourceCon = getSourceDbConnction();
Connection destCon = getDestinationDbConnction();
destCon.setAutoCommit(false);
int i=0;
String query;
while((query=getInsertQuery()!=null)
{         
     statement.executeUpdate(query);
     i++;
     if(i%10 == 0)
     {
       destCon.commit();
       i=0;
     }
}
destCon.commit();

The getInsertQuery function should give string in INSERT INTO table (col1...coln) VALUES (val1...valn), (val1...valn)....(val1...valn); format. Also it should return null, if all tables are processed.

If you are using Prepared Statements, you can use addBatch and executeBatch functions. Inside loop add values using addBatch function. After some inserts call executeBatch.

Fathah Rehman P
  • 8,401
  • 4
  • 40
  • 42
1

You need to use batch insert. Internally, Connector/J (MySQL JDBC driver) can rewrite batch inserts into multi values insert statements.

(Note that this is the default Connector/J behavior. You can add the option useServerPrepStmts=true to the JDBC url to enable server side prepared statements)

The code looks like the following:

 try(PreparedStatement stmt = connection.prepareStatement(sql)) {
    for(value : valueList) {   
      stmt.clearParameters();     
      stmt.setParameter(1, value);
      stmt.addBatch();
    }
    stmt.executeBatch();
 }

The code above will generate a multi value insert:

INSERT tablename(field) VALUES(value1), (value2), (value3) ...
nimrodm
  • 23,081
  • 7
  • 58
  • 59