2

I have to do a import in db from large csv file, about 100,000 records.

To do this I have tried dql and with the orm, but with the two options it takes me about 9 hours to complete the process.

I made the same load with Node.js and it was going much faster, about 5 minutes.

So I do not know if there is any option.

I have tried to clean and flush after every 20 lines of the file but it is still slow

Any idea how to improve this performance.

Thank you.

Tlaloc-ES
  • 4,825
  • 7
  • 38
  • 84

1 Answers1

2

Depending on how your import looks, you might want to bypass the Doctrine ORM entirely and get the Connection from the Entity Manager to work with the DBAL. This way you can just retrieve the array from the CSV and then insert them.

$dbal= $this->entityManager->getConnection();
$handle = fopen('exported_data.csv', 'r');
while (($row = fgetcsv($handle)) !== false)) {
    if (null === $row) {
        // Deal with invalid csv data
    }
    // maybe map row names to column names
    $dbal->insert('table_name', $row);
}
fclose($handle);

This should already improve things a bit. Next you have to see if transactions and inserting in small batches makes sense. Fortunately since you don't have to deal with object hydration and the unit of work from the ORM you don't have to clean every time.

You might also want to look at the Doctrine Batch Utils from ocramius, one of the Doctrine contributors, for your task.

If afterwards you still have performance problems you should start profiling your application with tools like blackfire or xhprof. This will tell you, which part of your code is actually causing the performance issue. It could be disk I/O and even fgetcsv() can sometimes be slow, because of the way it reads in the data chunks, but that is a performance impact on such a low level that I wouldn't bother with it, unless a profiler tells me to.

dbrumann
  • 16,803
  • 2
  • 42
  • 58
  • Only with dbal, I have gone from 2 insert per second to 40 insert per second, It would be possible to reach 100 or 500 insert per second, with a good internet connection? – Tlaloc-ES Oct 17 '18 at 09:25
  • Could be. As I said, it would be best to do a profiling. Blackfire is easy to set up and they offer a trial period where you can get the full report for 14 days or 30 days, I'm not sure. It will tell you where your bottlenecks are, e.g. if network is the problem or maybe some part of your php code. Disclaimer: I work for SensioLabs, so technically I am affiliated with Blackfire. But try out the free version, it's a cool tool for finding performance problems. :) – dbrumann Oct 17 '18 at 09:42
  • 1
    ok, I will try the trial version, great answer and thanks for all. – Tlaloc-ES Oct 17 '18 at 09:45