8

I'm creating a PHP script that imports some data from text files into a MySQL database. These text files are pretty large, an average file will have 10,000 lines in it each of which corresponds to a new item I want in my database. (I won't be importing files very often)

I'm worried that reading a line from the file, and then doing a INSERT query, 10,000 times in a row might cause some issues. Is there a better way for me to do this? Should I perform one INSERT query with all 10,000 values? Or would that be just as bad?

Maybe I can reach a medium, and perform something like 10 or 100 entries at once. Really my problem is that I don't know what is good practice. Maybe 10,000 queries in a row is fine and I'm just worrying for nothing.

Any suggestions?

nate
  • 83
  • 5

3 Answers3

6

yes it is

<?php
$lines = file('file.txt');
$count = count($lines);
$i = 0;
$query = "INSERT INTO table VALUES ";
foreach($lines as $line){
    $i++;
    if ($count == $i) {
        $query .= "('".$line."')";
    }
    else{
        $query .= "('".$line."'),";
    }
}
echo $query;

http://sandbox.phpcode.eu/g/5ade4.php

this will make one single query, which is multiple faster than one-line-one-query style!

genesis
  • 50,477
  • 20
  • 96
  • 125
  • 3
    But be careful, because the size of the query must be less than your max_allowed_packet parameter for the server! (see my.ini) – spacediver Jul 24 '11 at 08:01
  • @nate: Please show your support by [accepting my answer via button showed on image](http://i.imgur.com/uqJeW.png) – genesis Jul 24 '11 at 10:46
5

Use prepared statements, suggested by the authors of High Performance MySQL. It saves a lot of time (saves from wasteful protocol and SQL ASCII code).

Dor
  • 7,344
  • 4
  • 32
  • 45
  • Oh nice! I had no idea anything like this existed. I'll def. use prepared statements for my import. Thanks! – nate Jul 24 '11 at 01:07
2

I would do it in one large query with all the values at once. Just to be sure, though, make sure you run START TRANSACTION; before and COMMIT; afterwards, so that if something goes wrong during the execution of the query (which is possible, since it will most likely run for a fairly long time), the database will not be affected.

EdoDodo
  • 8,220
  • 3
  • 24
  • 30
  • beware that issuing a multiple insert will reduce your ability to trace errors. if one of the rows fails to insert, you wont be able to detect which one triggered the error (and of course, will drop the whole batch job instead of at least having the good ones) – marcelog Jul 23 '11 at 23:40
  • True. I'm assuming all the data being inserted is sane, so no row should fail to insert. Validation should be performed on the data before it is added to the query string to make sure of this. – EdoDodo Jul 23 '11 at 23:42
  • IMNSHO, losing the entire transaction is preferable to a partial transaction being committed. If validation is likely to be a problem, I'd try to solve that in the framework of a single transaction, for instance by first inserting into a temporary table, querying between the new data and existing data for conflicting rows, and then transferring data from the temp table to the main table once i'm sure everything's A-OK. this would be the best of both worlds (at the cost of somewhat greater code complexity) – SingleNegationElimination Jul 23 '11 at 23:48
  • i agree that loosing the transaction is *generally* what you'll want. it all depends on the actual needs of the application – marcelog Jul 23 '11 at 23:56
  • Yeah, only a part of the rows being inserted would probably be the worst thing possible. If that happened you would have to manually check the database to see how far the import got, and then edit the import file to remove the rows that were already inserted, and re-run the import script. – EdoDodo Jul 23 '11 at 23:56
  • Okay, thanks for the feedback. yea i probably would rather have the whole thing fail as opposed to part of it. This import will not be running automatically, so a user will be able to see and tweak things if something goes wrong. – nate Jul 24 '11 at 01:06