0

I have a script that uses simplexml_load_string to parse an 658kB XML file. The file is a property (real estate) feed with 118 different properties totalling 21000 lines. The script uses a lot of the following calls to extract the data from nodes:

(string)$properties->address->county

I'm also using Advanced Custom Fields in the script to update metadata custom fields in WordPress, lots more calls of:

update_field( 'field_59606d60525d3', (string)$properties->floorplans, $post_id );

On a Vagrant VVV box the script takes over 5 minutes to run, timing out after that. It manages to load into a custom post type about 46 out of the 118 properties What I don't know is the bottleneck. Is it:

  • simplexml parsing the file?
  • using update_field in ACF?

Webgrind (xdebug) appears to point to a lot of update_meta calls, but i'm not really sure what to look for and understand in a cachegrind file.

I suppose what I am asking is there a faster alternative to PHP native simpleXML and how does one interpret XDEBUG/webgrind output

This script will eventually be running on commodity hosting (no VPS/dedicated)

Skill level: procedural (functions, NOT classes)

xdebug output:

Call Stack
#   Time    Memory  Function    Location
1   0.2021  361704  {main}( )   .../test.php:0
2   0.6501  5888288 get_xml( )  .../test.php:163
3   544.3322    115472480   update_field( string(19), array(457), long )    .../test.php:115
4   544.3325    115472480   acf_update_value( array(457), long, array(24) ) .../api-template.php:1018
5   544.3325    115472536   apply_filters( string(30), array(457), long, array(24) )    .../api-value.php:350
6   544.3325    115472936   WP_Hook->apply_filters( array(457), array(3) )  .../plugin.php:203
7   544.3326    115473688   acf_field_repeater->update_value( array(457), long, array(24) ) .../class-wp-hook.php:298
8   556.4756    117433368   acf_field_repeater->update_row( array(2), long, array(24), long )   .../repeater.php:900
9   556.4756    117434744   acf_update_value( string(42), long, array(20) ) .../repeater.php:804
10  556.5003    117437600   acf_update_metadata( long, string(15), string(19), true )   .../api-value.php:368
11  556.5004    117438016   update_metadata( string(4), long, string(15), string(19), ??? ) .../api-value.php:101
12  556.5005    117438136   get_metadata( string(4), long, string(15), ??? )    .../meta.php:193
13  556.5005    117438512   update_meta_cache( string(4), array(1) )    .../meta.php:497
14  556.5124    118050992   intval ( string(3) )    .../meta.php:830

UPDATE #1 01/08/2017

I'm at a stage with this where I decided that file_get_contents might be the issue as each property in the feed has around 10 to 15 image URLs associated. 118 properties = just shy of 1800 images URL calls to make. I tried cUrl then stumbled on curl_multi.

I've now got working code below that will curl_multi on an array of image URLs, add them into WP as attachments and attach them to a specific post_id whilst updating an ACF gallery field. However I am still none the wiser on whether this is actually faster or not? How do I time something like this or work out if curl_multi is actually doing things asynchronously or if my code is correct?

require_once( '/srv/www/broadbean/wp-blog-header.php' );
require_once( '/srv/www/broadbean/wp-admin/includes/media.php' );
require_once( '/srv/www/broadbean/wp-admin/includes/file.php' );
require_once( '/srv/www/broadbean/wp-admin/includes/image.php' );

// https://stackoverflow.com/questions/15436388/download-multiple-images-from-remote-server-with-php-a-lot-of-images
// http://php.net/manual/en/function.curl-multi-init.php

$post_id = '2773';
$image_urls = array( 'http://target.domain.net/photos/1334268.jpg', 'http://target.domain.net/photos/1278564.jpg', 'http://target.domain.net/photos/1278565.jpg' );
$chs = array();
$upload_dir = wp_upload_dir();
$tc = count($image_urls);
$cmh = curl_multi_init();

for ($t = 0; $t < $tc; $t++)
{
    $chs[$t] = curl_init();
    curl_setopt($chs[$t], CURLOPT_URL, $image_urls[$t]);
    //curl_setopt($chs[$t], CURLOPT_FILE, $fp);
    curl_setopt($chs[$t], CURLOPT_FOLLOWLOCATION, 1);
    curl_setopt($chs[$t], CURLOPT_RETURNTRANSFER, 1);
    curl_setopt($chs[$t], CURLOPT_TIMEOUT, 120);
    curl_setopt($chs[$t], CURLOPT_USERAGENT, 'Mozilla/5.0');
    curl_multi_add_handle($cmh, $chs[$t]);
}

$running = null;

do {
    curl_multi_exec($cmh, $running);
} while ($running);


for ($t = 0; $t < $tc; $t++)
{
    $filename = basename( $image_urls[$t] );
    $image_file = $upload_dir['path'] . '/' . $filename;
    $fp = fopen($image_file, 'w+');

    fwrite($fp, curl_multi_getcontent( $chs[$t] ) );
    fclose($fp);

    $wp_filetype = wp_check_filetype($image_file, null );

    $attachment = array(
        'post_mime_type' => $wp_filetype['type'],
        'post_title' => sanitize_file_name( $filename),
        'post_content' => '',
        'post_status' => 'inherit'
    );


    $attach_id = wp_insert_attachment( $attachment, $image_file, $post_id );
    $attach_data = wp_generate_attachment_metadata( $attach_id, $image_file );
    $update_attach_metadata = wp_update_attachment_metadata( $attach_id, $attach_data );

    $add_gallery_images[] = $attach_id;

    curl_multi_remove_handle($cmh, $chs[$t]);
    curl_close($chs[$t]);

}
var_dump($add_gallery_images);
update_field( 'field_5973027c18fdc', $add_gallery_images , $post_id );

curl_multi_close($cmh);
essexboyracer
  • 358
  • 3
  • 17

1 Answers1

0

Not really a specific answer to your problem, but since you're using Wordpress, have you tried using WP All Import? It has a good implementation for ACF too (I think u need the pro version in that case)

robinl
  • 86
  • 3
  • I hadn't but that is a good point. I have a colleague that uses it to import MS Access exports into ACF to good effect (about 4000 records). What's different though between WP All Import and doing it 'by hand'? Perhaps I need to post a code snippet? – essexboyracer Jul 15 '17 at 22:22
  • Well the main benefit is that the WP All Import plugin is very well optimized for doing particularly this kind of operations. If you really want to do this without using an existing plugin/library or so then you should know all the do's and dont's for Wordpress (and in this case Advanced Custom Fields), which will probably result in writing custom mysql queries. – robinl Jul 15 '17 at 23:16