1

To resume my problem :

In my model, a Contact is owned by a Company, so when I create the Contact, I need to do something like

$contact->setCompany($company);

Let's say I need to create a Company and a Contact object for each row from an Excel file. If I encounter a known Company, I do not want to create it again. If another Contact is in the same Company, I want to use the persisted Company. In order to do that, I'm doing the following :

$newCompanies = array();

foreach ($rows as $row) {
    $company = $this->entityManager()->getRepository(Company::class)->findOneBy(array("name" => $companyName))); // try to find an existing Company in the DB

    if (!isset($company) {
        if (!isset($newCompanies[$companyName])) {
            $company = new Company();
            $company->setName($companyName);

            $this->entityManager->persist($company);
            $newCompanies[$companyName] = $company;   // save the company in an array so we can use it later
        } else {
            $company = $newCompanies[$companyName];
        }
    }

    $contact = new Contact();
    $contact->setCompany($company);
    [...]
}

Problem is, I always reach a really big array, and I think the php memory limit is reached everytime for (let's say) 5000+ rows.

Is there a "cleaner" solution beside saving my objects in an array ? Without changing the memory_limit in php.ini ?

Thank you

Tibo
  • 197
  • 2
  • 15
  • Does this answer your question? [When inserting an entity with associations, is there a way to just use the FK instead of retrieving the entity?](https://stackoverflow.com/questions/5382170/when-inserting-an-entity-with-associations-is-there-a-way-to-just-use-the-fk-in) – Code Spirit Jul 30 '20 at 15:08
  • Not really, in the accepted answer, Posts are already in the database. I can't get a Company with $em->getRepository(Company::class)->findOneBy() because the Companies are in a persistent state. That's why I store my persisted $company in an array, so I can get it later on. – Tibo Jul 30 '20 at 15:16
  • What do you mean by "persistent state"? The posts are not the main focus in the answer but the tags (companies in your case). If you have persisted a company you can get a reference to it. – Code Spirit Jul 30 '20 at 15:20
  • 1 Post for multiple Tags <=> 1 Company for multiple Contacts. But in the accepted answer, the Post is retrieved by its ID with $entityManager->getRepository(...)->find($id). My Company does not have an ID as I haven't $entityManager->flush() yet. – Tibo Jul 30 '20 at 15:29

1 Answers1

1

You can flush changes to DB and unset (reassign) $newCompanies array as proposed at documentation.

$batchSize = 20;
foreach ($rows as $i => $row) {
    // Payload
    if (($i % $batchSize) === 0) {
        $this->entityManager->flush();
        $this->entityManager->clear();
        $newCompanies = [];
    }
}
shvv
  • 369
  • 1
  • 7
  • That means I need to flush after each row ? flush() takes a lot of time if I'm not mistaken – Tibo Jul 30 '20 at 15:32
  • After `$batchSize` rows. Twenty in this example. – shvv Jul 30 '20 at 15:33
  • What if I want to flush only at the end of the treatment, so that If something is wrong, nothing happens ? With your solution, rows are created in the database even if something goes wrong before the end of the treatment – Tibo Jul 30 '20 at 15:40
  • I can't see you full code. Do you validate or remove some Company after loop? – shvv Jul 30 '20 at 15:44
  • My Excel sheet contains these info for each row : the Company and the Contact. The Company has a name, and let's say the Contact has a firstname and lastname. In my database, I already have many Companies, so maybe in the Excel sheets, there is the same Companies. If I see a new Company name, I create it (without flushing because I want to check all rows first), then add the Contact in it. If I see an existing Company, I use it. BUT : if my Company didn't already exist in the database, I wan't to create it and add all Contacts that are in this Company from my Excel sheet. – Tibo Jul 30 '20 at 15:49
  • Since the Company doesn't own an ID I can't get it with $em->getRepository()->find($id). And since I haven't flush() yet, I need to store my $newCompany in an array to get it if I need it for next rows – Tibo Jul 30 '20 at 15:51
  • Don't see problem at all. You flush companies after some rows and persist it to DB. In next iteration you check company by `findOneBy` from DB, not from array. I think is good compromise between memory and execution time. – shvv Jul 30 '20 at 16:03
  • I wish I would avoid this, because if the last row contains errors, I should not have the previous rows be persisted in my database That said, that's a huge improvement of the execution time :) so thank you a lot – Tibo Jul 30 '20 at 16:14
  • 1
    you could additionally wrap it in a transaction so that you can rollback if your last row contained an error. – Jakumi Jul 30 '20 at 16:51
  • yes starting a transaction also has a positive impact on flush performance because the indices are not rebuilt after saving the new rows. – olidem Jul 30 '20 at 23:11