I have a script that uses simplexml_load_string to parse an 658kB XML file. The file is a property (real estate) feed with 118 different properties totalling 21000 lines. The script uses a lot of the following calls to extract the data from nodes:
(string)$properties->address->county
I'm also using Advanced Custom Fields in the script to update metadata custom fields in WordPress, lots more calls of:
update_field( 'field_59606d60525d3', (string)$properties->floorplans, $post_id );
On a Vagrant VVV box the script takes over 5 minutes to run, timing out after that. It manages to load into a custom post type about 46 out of the 118 properties What I don't know is the bottleneck. Is it:
- simplexml parsing the file?
- using update_field in ACF?
Webgrind (xdebug) appears to point to a lot of update_meta calls, but i'm not really sure what to look for and understand in a cachegrind file.
I suppose what I am asking is there a faster alternative to PHP native simpleXML and how does one interpret XDEBUG/webgrind output
This script will eventually be running on commodity hosting (no VPS/dedicated)
Skill level: procedural (functions, NOT classes)
xdebug output:
Call Stack
# Time Memory Function Location
1 0.2021 361704 {main}( ) .../test.php:0
2 0.6501 5888288 get_xml( ) .../test.php:163
3 544.3322 115472480 update_field( string(19), array(457), long ) .../test.php:115
4 544.3325 115472480 acf_update_value( array(457), long, array(24) ) .../api-template.php:1018
5 544.3325 115472536 apply_filters( string(30), array(457), long, array(24) ) .../api-value.php:350
6 544.3325 115472936 WP_Hook->apply_filters( array(457), array(3) ) .../plugin.php:203
7 544.3326 115473688 acf_field_repeater->update_value( array(457), long, array(24) ) .../class-wp-hook.php:298
8 556.4756 117433368 acf_field_repeater->update_row( array(2), long, array(24), long ) .../repeater.php:900
9 556.4756 117434744 acf_update_value( string(42), long, array(20) ) .../repeater.php:804
10 556.5003 117437600 acf_update_metadata( long, string(15), string(19), true ) .../api-value.php:368
11 556.5004 117438016 update_metadata( string(4), long, string(15), string(19), ??? ) .../api-value.php:101
12 556.5005 117438136 get_metadata( string(4), long, string(15), ??? ) .../meta.php:193
13 556.5005 117438512 update_meta_cache( string(4), array(1) ) .../meta.php:497
14 556.5124 118050992 intval ( string(3) ) .../meta.php:830
UPDATE #1 01/08/2017
I'm at a stage with this where I decided that file_get_contents
might be the issue as each property in the feed has around 10 to 15 image URLs associated. 118 properties = just shy of 1800 images URL calls to make. I tried cUrl then stumbled on curl_multi
.
I've now got working code below that will curl_multi
on an array of image URLs, add them into WP as attachments and attach them to a specific post_id
whilst updating an ACF gallery field. However I am still none the wiser on whether this is actually faster or not? How do I time something like this or work out if curl_multi
is actually doing things asynchronously or if my code is correct?
require_once( '/srv/www/broadbean/wp-blog-header.php' );
require_once( '/srv/www/broadbean/wp-admin/includes/media.php' );
require_once( '/srv/www/broadbean/wp-admin/includes/file.php' );
require_once( '/srv/www/broadbean/wp-admin/includes/image.php' );
// https://stackoverflow.com/questions/15436388/download-multiple-images-from-remote-server-with-php-a-lot-of-images
// http://php.net/manual/en/function.curl-multi-init.php
$post_id = '2773';
$image_urls = array( 'http://target.domain.net/photos/1334268.jpg', 'http://target.domain.net/photos/1278564.jpg', 'http://target.domain.net/photos/1278565.jpg' );
$chs = array();
$upload_dir = wp_upload_dir();
$tc = count($image_urls);
$cmh = curl_multi_init();
for ($t = 0; $t < $tc; $t++)
{
$chs[$t] = curl_init();
curl_setopt($chs[$t], CURLOPT_URL, $image_urls[$t]);
//curl_setopt($chs[$t], CURLOPT_FILE, $fp);
curl_setopt($chs[$t], CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($chs[$t], CURLOPT_RETURNTRANSFER, 1);
curl_setopt($chs[$t], CURLOPT_TIMEOUT, 120);
curl_setopt($chs[$t], CURLOPT_USERAGENT, 'Mozilla/5.0');
curl_multi_add_handle($cmh, $chs[$t]);
}
$running = null;
do {
curl_multi_exec($cmh, $running);
} while ($running);
for ($t = 0; $t < $tc; $t++)
{
$filename = basename( $image_urls[$t] );
$image_file = $upload_dir['path'] . '/' . $filename;
$fp = fopen($image_file, 'w+');
fwrite($fp, curl_multi_getcontent( $chs[$t] ) );
fclose($fp);
$wp_filetype = wp_check_filetype($image_file, null );
$attachment = array(
'post_mime_type' => $wp_filetype['type'],
'post_title' => sanitize_file_name( $filename),
'post_content' => '',
'post_status' => 'inherit'
);
$attach_id = wp_insert_attachment( $attachment, $image_file, $post_id );
$attach_data = wp_generate_attachment_metadata( $attach_id, $image_file );
$update_attach_metadata = wp_update_attachment_metadata( $attach_id, $attach_data );
$add_gallery_images[] = $attach_id;
curl_multi_remove_handle($cmh, $chs[$t]);
curl_close($chs[$t]);
}
var_dump($add_gallery_images);
update_field( 'field_5973027c18fdc', $add_gallery_images , $post_id );
curl_multi_close($cmh);