0

I am using PHPSpreadsheet to take some spreadsheets a user can upload, add a column with certain values, save the file as CSV, and use the following query to import the csv file:

LOAD DATA LOCAL INFILE '{$file}'
INTO TABLE {$table}
FIELDS TERMINATED by ',' ENCLOSED BY '"'
LINES TERMINATED BY '\r\n'
IGNORE 1 LINES

Alternatively I can do something like:

foreach($rows as $row){
    // INSERT $row INTO table
}

The spreadsheets will all have the same columns/data-types.

What would be the most efficient way to do this? Going from Xlsx -> CSV -> MySQL Import seems like I am adding extra steps.

Bijan
  • 7,737
  • 18
  • 89
  • 149
  • 1
    I really hope that `$table` is not a user-supplied argument or this query is trouble. – tadman Oct 31 '18 at 22:57
  • 1
    Don't worry, `$table` is not user-supplied :) – Bijan Oct 31 '18 at 22:58
  • Read the excel file directly in to PHP and then loop through it. Looping through it allow you to better control anomalies even if it's a little less efficient. – Difster Oct 31 '18 at 22:59
  • Then I also hope that `$file` isn't, because that's got the same problems. Using placeholder values whenever possible is a good idea. Anything else creates huge risks. – tadman Oct 31 '18 at 22:59
  • It looks like the only way to really "simplify" it, in terms of removing steps from the process, would be to go from xlsx directly to mysql, which would mean iterating the rows with phpspreadsheet and doing a bunch of separate inserts. phpspreadsheet is probably going to iterate the rows anyway when it creates the CSV, which seems to reduce the problem to LOAD vs. multiple inserts, unless I'm missing something. – Don't Panic Oct 31 '18 at 23:05

1 Answers1

2

MySQL's direct CSV import is usually the fastest option, however it is not without limitations. One is that you need to import all or nothing in the file and you won't know how far along it is until it's done. As some imports can take hours, even days, you may not know where it's at. The entire insert operation on an InnoDB table takes place atomically for performance reasons but that means it's not visible until fully committed.

Another is the file must be present on the server. The LOCAL option is a quirky feature of the mysql command-line tool and probably doesn't work in your database driver unless emulated.

Inserting row-by-row with a CSV parser is almost always slower. If you must do a thing, be sure to prepare an INSERT statement once and re-use it in the loop, or do a "multi-INSERT" with as many rows as you can fit in your max statement size buffer.

tadman
  • 208,517
  • 23
  • 234
  • 262