0

I'm hoping someone can help me here. I'm building a Wordpress plugin that will pull data from an XML feed and store it in a database table. The data includes image, so it is also downloading all of the images into Wordpress's "uploads" folder so that when the data is displayed on the front end of the site it doesn't have to make remote calls to display those images.

When pulling in about 100 or so items from the XML feed, it's ok. But in some cases there may be 500 - 1000 items that need to be pulled in and that is taking a huge amount of time, resulting of course in a 504 gateway timeout error.

Here is the function I'm running:

function add_items_to_database(){
    global $wpdb;
    $items_table = $wpdb->prefix . "stored_items";

    // CREATE ARRAY FROM SUBMITTED ITEM ID'S //
    $items_ids = explode(',',$_POST['item_ids']);

    // GET EACH ITEMS FROM XML FEED AND STORE THEM IN THE WEBSITE DATABASE //
    $allItems = array();
    foreach($item_ids as $item_id){
        $i=0;

        do{
            $request_url = 'https://example.com/Item/Rss?ouid='.$item_id.'&pageindex=' . $i . '&searchresultsperpage=thirty';
            $results = getItems($request_url);
            $xml = simplexml_load_string($results, "SimpleXMLElement", LIBXML_NOCDATA);
            $json = json_encode($xml);
            $array = json_decode($json,TRUE);
            $itemsCount = $array['items']['@attributes']['totalcount'];
            foreach($array['entry'] as $item){
                $allItems[] = $item['item']['@attributes']['itemid'];
            }
            $i++;
        } while (count($allItems) < $itemsCount);

        foreach($allItems as $item_id){
            $item = get_item($item_id);
            $fields = get_item_fields($item);
            $wpdb->insert($items_table,$fields);
        }
    }
}

So here, "getItems" is another function which is just grabbing an array of all item_ids from the XML feed. Then for each of those id's, "get_item" then grabs all the XML data for that item. "get_item_fields" then assigns each bit of data from the XML feed to a php array variable called $fields and also downloads all of the images and stores their new local URL's in that $fields variable as well. The contents of the $fields variable is then saved into the database.

Now, of course, it appears that it's the downloading of all the image files which is hanging the system up the longest and causing the gateway timeout issue.

After a bit of Googling, there appears to be the suggestion that I might be able to fix this issue by using "curl_multi" to run all of these processes in parallel. I'm really not familiar with curl at all and I'm having a bit of trouble getting my head around it. I'm hoping someone might be able to shed some light on how I might be able to alter the above code to correctly implement the use of "curl_multi" and whether or not that really is the way to go?

Thanks in advance.

Graphic Detail
  • 101
  • 1
  • 11
  • what triggers this import? is it the wordpress cron? I think you're going to need to batch your inserts. Download your feed XML, cache it somewhere, and then iterate over them in chunks of 100 or whateaver you deem apropriate, this might need to work as a background job – Scuzzy Apr 16 '21 at 04:13
  • You may use some asynchronous task. See https://torquemag.io/2016/01/use-asynchronous-php-wordpress/ – Jean-Baptiste Yunès Apr 16 '21 at 04:15
  • It's not being triggered by a cron at this stage; for now it manually triggered with the click of a button. I've been reading all the information I can find on asynchronous php but I'm just not understanding it. I was kind of hoping some kind, brilliant genius out there might be able to look at my code above and spit a modified version of it back at me with the async code added in a way that will work. – Graphic Detail Apr 19 '21 at 21:38

0 Answers0