0

I am using php, Laravel, Redis, and SQL on an Ubuntu localhost server. I have made a bunch of methods that return results from API searches after some processing. I am calling 5 of these methods which will be very slow if done synchronously, so I've been experimenting with async approaches (which I know php isn't optimised for). After a few approaches I have found some success with pcntl_fork(), but I'm running into some nasty problems.

Edit: After some messing around I have found that if I remove the while loop then the code afterward executes properly, I have removed the while loop and placed it in the second 'search' method. However it still causes a freeze of the system. This makes no sense as there shouldn't be an infinite loop as if I manually query the Redis db, all 5 results are there.

This is my code: (I have a few custom classes for making and processing the API calls, fyi these methods work flawlessly)

//this caches the individual api results to a Redis list
public static function cacheAsyncApiSearch(string $searchQuery, int $maxResults = 20)
{
        $key = "search:".$searchQuery; //for Redis

        if(!Redis::client()->exists($key)) {

            for ($i = 0; $i < 5; $i++) {
                // Create a child process
                $pid = pcntl_fork();

                if ($pid == -1) {
                    // Fork failed
                    exit(1);
                } elseif ($pid) {
                    // This is the parent process
                      // I have tried many versions of pcntl_wait, none work! They all still don't allow code to be ran afterwards (even within this elseif block), and the best it does is cache the 1st api case (YouTube)


//                    while (!pcntl_wait($status, WNOHANG)) {
//                        $exitStatus = pcntl_wexitstatus($status);
//                        // Do something with the exit status of the child process
//                    }
//                    dd($pid);
//                pcntl_waitpid($pid, $status, WUNTRACED);
                } else {
                    //child processes
                    switch ($i) {
                        case 0:
                            $results = YouTube::search($searchQuery, $maxResults)['results'];
                            Redis::client()->rPush($key,SearchResultDTO::jsonEncodeArray($results));
                            SearchResultDTO::convertResultDTOToModels($results);
                            break;
                        case 1:
                            $results = Dailymotion::search($searchQuery, $maxResults)['results'];
                            Redis::client()->rPush($key,SearchResultDTO::jsonEncodeArray($results));
                            SearchResultDTO::convertResultDTOToModels($results);
                            break;
                        case 2:
                            $results = Vimeo::search($searchQuery, $maxResults)['results'];
                            Redis::client()->rPush($key,SearchResultDTO::jsonEncodeArray($results));
                            SearchResultDTO::convertResultDTOToModels($results);
                            break;
                        case 3:
                            $results = Twitch::search($searchQuery, 2)['results'];
                            Redis::client()->rPush($key,SearchResultDTO::jsonEncodeArray($results));
                            SearchResultDTO::convertResultDTOToModels($results);
                            break;
                        case 4:
                            $results = Podcasts::getPodcastsFromItunesResults(Podcasts::search($searchQuery, 2)["response"]->results);
                            Redis::client()->rPush($key,SearchResultDTO::jsonEncodeArray($results));
                            SearchResultDTO::convertResultDTOToModels($results);
                            break;
                    }
                    $i = 10000;
                    exit(0); 
                }
            }


            // for noting the process id of the given process that gets to this point
            Redis::client()->lPush("search_pid:".$searchQuery, $pid);
            
            // sets a time out for the redis cache
            Redis::client()->expire($key, 60*60*4);


            while (is_numeric( Redis::client()->lLen($key)) && Redis::client()->lLen($key) < 5) {
                usleep(500000); // 0.5 seconds
//                pcntl_waitpid(-1, $status); //does this even do anything? not for me
            }
            return false; // not already cached
        }
        return true; // already cached
    }

This code somewhat works, It performs the api calls and caches the Redis perfectly. However when the method is ran, no code will be ran after it (unless redis has found a cached version and the process is not forked).

This made me think that all processes are being exited (possibly true? if so i dont know why), so I tried writing a version without the exit(0) line. This works, I can then perform code after the method call, however I noticed (when getting SQL race conditions) that all 6 (5 child, 1 parent) processes continued to run their own version of the code after this method (e.g. some database writes)

public static function search(string $searchQuery, int $maxResults = 20): array
    {
        $key = "search:".$searchQuery;
        $results = [];

        // the quoted method above
        self::cacheAsyncApiSearch($searchQuery, $maxResults);

        foreach (Redis::client()->lRange($key,0,-1) as $result){
            $results = array_merge($results, SearchResultDTO::jsonDecodeArray($result));
        }

        $creatorDTOs = [];
        $videoDTOs = [];
        $streamDTOs = [];
        $playlistDTOs = [];
        $podcastDTOs = [];

        /** @var SearchResultDTO $result */
        foreach ($results as $result) {
            match ($result->kind) {
                Kind::Creator => $creatorDTOs[] = $result,
                Kind::Video => $videoDTOs[] = $result,
                Kind::Stream => $streamDTOs[] = $result,
                Kind::Playlist => $playlistDTOs[] = $result,
                Kind::Podcast => $podcastDTOs[] = $result,
            };
        }

        // did this to test how many times the code was being ran (the list has 6 1's in it)
        Redis::client()->lPush("here", '1');

        // I know this code isn't completely efficient since I already called these conversion methods before, however I am just trying to get the forking stuff to work right now.
        return [
            "creators" => SearchResultDTO::convertResultDTOToModels($creatorDTOs),
            "videos" => SearchResultDTO::convertResultDTOToModels($videoDTOs),
            "streams" => SearchResultDTO::convertResultDTOToModels($streamDTOs),
            "playlists" => SearchResultDTO::convertResultDTOToModels($playlistDTOs),
            "podcasts" => SearchResultDTO::convertResultDTOToModels($podcastDTOs)
        ];
    }

These DTO's (Data Transfer Objects) are being used to populate a UI. So for example, when I make a search (that isn't cached), the page is blank forever. But if I refresh the page (after the search is cached) then the results show just fine.

This is the most bizarre problem I have ever ran into and I really appreciate any help.

Edit please read: After some messing around I have found that if I remove the while loop then the code afterward executes properly, I have removed the while loop and placed it in the second 'search' method. However it still causes a freeze of the system. This makes no sense as there shouldn't be an infinite loop as if I manually query the Redis db, all 5 results are there. And the dd("two") can never be excecated unless the usleep() is removed. Hopefully this narrows the problem down.

Edit 2 please read: I have figured out that I can get the dd("two") to work when usleep() is reduced to 0.05s from 0.5 seconds, but it still doesnt seem to run long enough for it to work.

if(!self::cacheAsyncApiSearch($searchQuery, $maxResults))
            {
                // make sure Redis is properly returning a number not object
                $len = Redis::client()->lLen($key);
                while(!is_numeric($len)){
                    usleep(500000); // 0.5 seconds
                    $len = Redis::client()->lLen($key);
                }
                //dd($len); //this dd() works
                while ($len < 5) {
                    dd("one"); // this dd() works

                    usleep(500000); // 0.5 seconds
                    dd("two"); **//$this does not work, why?**

                    $len = Redis::client()->lLen($key);
                }

            }
  • 2
    "on an Ubuntu localhost server" ? Surely you are not trying to call pcntl_fork() from a web request? That's never going to end well. Your code is terminating unexpectedly and you haven't checked your logs to see what they say? – symcbean Dec 24 '22 at 21:18
  • If you are talking about the laravel.log file, if I wipe it, and then run my code it stays empty. Can you explain why this is "never going to end well"? How else am I to run this code for a website? – joshuasy10 Dec 24 '22 at 22:28
  • 1
    If you want to run multiple threads from a web request use curl_multi_exec(). And your webserver error log SHOULD be recording any PHP issues and SHOULD be the first place you look when things go awry. – symcbean Dec 24 '22 at 23:13
  • I am aware of curl_multi_exec() which would probably be workable, however these custom api methods I've made do alot more than just retrieve the data. It would require me to rewrite a massive ammount of code. This fork version allows me to use these already made methods which is why im trying to get it to work. I've made a few edits in my question as I've found that it might not be an issue with pcntl_fork, as it's breaking on usleep() evwn after all processes are exited bar the parent. I appreciate u helping me btw. – joshuasy10 Dec 24 '22 at 23:48
  • Have you looked at ReactPHP https://reactphp.org/ , AMPHP https://amphp.org (v2), Swoole (https://github.com/swoole/swoole-src/) ? You can go full async with PHP – Dmitry K. Dec 25 '22 at 02:36
  • I've experimented with amp async and parellel functions as well as react. This works for simple functionality, but it seems the async stuff cant see any of the context of the normal system (e.g. config variables) so i then landed on pcntl_fork since it makes duplicate processes with the only difference being the process id. – joshuasy10 Dec 25 '22 at 14:28

0 Answers0