0

I'm creating an application in Symfony, and I need to retrieve a large number of customer records (tens of thousands) from an external API endpoint, then store it in a Doctrine database. The API will only return 100 results at a time, so it's paginated into a few hundred pages. Triggering this synchronously resulted in a very long wait before running out of memory (Not surprising), so I pulled the code out to a message handler, which essentially looks like this

    public function __invoke(GetCustomersMessage $message)
    {
        $response = $this->client->request('GET', $message->getQueryUrl() . '&page=' . $message->getPage(), [
            'auth_basic' => $this->authCreds,
        ]);
        $statusCode = $response->getStatusCode();
        if ($statusCode != 200) {
            throw new \Exception('Error: ' . $statusCode);
        }
        $content = $response->getContent();
        $data = json_decode($content, true);
        $result = $this->createCustomers($data, $message->getStoreId());
        // If we successfully added all new customers, get another batch.
        if ($result === true) {
            $this->bus->dispatch(new GetCustomersMessage($message->getQueryUrl(), $message->getPage() + 1, $message->getStoreId()));
        }
    }

So essentially I query the api, then try to add those customers to my database, and if they all get added successfully, I dispatch a message to get the next batch. The message transport is the async Doctrine transport. I spun up a worker to consume the messages, and it synced way more customers than the previous attempt, but still ran out of memory after about 250. I was surprised to see though that when the worker died, it didn't leave the final message as incomplete, nor move it to the "failed" queue, it was just gone, so when I created another worker it was unable to pick back up where the other one left off.

This is my first time attempting a messenger/bus architecture, am I approaching this wrong? I considered queueing up all the messages at once, but I still believe I'd lose the data contained in whichever message the worker died on. Secondarily the intention is that this would run whenever we need to sync customers, so it stops when it reaches a record we already have in the database, if I queue up a message for every batch it would make a few hundred useless calls on every sync besides the first. Is there a way to monitor a worker and kill it before it reaches the memory limit?

ReeceNG
  • 1
  • 1
  • How are you running your worker. Symfony has some notes on the subject that might help you: https://symfony.com/doc/current/messenger.html#deploying-to-production – Chris Haas Jul 01 '22 at 21:10
  • Ah I was planning to use supervisor but somehow hadn't considered just including an arbitrary limit per worker, that's probably the easiest solution, thank you. – ReeceNG Jul 01 '22 at 21:39
  • 1
    With supervisor you can set worker's timeout, it grant you that the new worker begin on the exact point (with the help of some logic on your code) that the last process stoped. – Francisco Jul 01 '22 at 23:13

0 Answers0