I've been working with importing large CSV files of data; usually less than 100,000 records. I'm working with PHP and MySQL (InnoDB tables). I needed to use PHP to transform some fields and do some text processing prior to the MySQL INSERT
s (part of process_note_data()
in code below). MySQL's LOAD DATA
was not feasible, so please do not suggest it.
I recently tried to improve the speed of this process by using MySQL transactions using START TRANSACTION
and COMMIT
. The performance increase was surprising. Processing time(s) dropped by a factor of 20. So, a 20 minute process only took about 1 minute.
QUESTIONS.
1.) Does anyone understand why there was such performance increase (20 mins to 1 min)?
2.) Should I be concerned about how big the transaction may get with 100,000 records?
3.) Should I be concerned with a large number of inserts and/or updates in the transaction?
/*
* Customer Notes Data:
* Rows are either a meeting, call or note!
*/
$row = 1;
$data = array();
$fields = array();
$line = '';
$db->query('SET autocommit=0;');
$db->query('START TRANSACTION;');
if (($handle = fopen("modules/".$currentModule."/Data/customernote.csv", "r")) !== FALSE) {
while (($data = fgetcsv($handle, 4096, ',', '"')) !== FALSE && $row < 999000) {
//Row 1 - CSV header row with field names
if ($row == 1) {
$csv_fields = $data;
} elseif ($row > 1) {
$fields = $this->process_note_data($data, $csv_fields, $row);
}
$row++;
} // end while
fclose($handle);
}
$db->query('COMMIT;');
$db->query('SET autocommit=1;');
Note: The text/field processing is done in the call to $this->process_note_data()
which then calls another helper class that has the INSERT
statement code. I didn't have enough room to include all of the code. $db->query()
is a typical database object for MySQL queries.