We have an application that parses data from external sources and localizes it back, saving and resizing images as the final step of the process. Given the size of our processing [2 million images to date] we've been using Rackspace Files for hosting the data...
require('/var/libs/rackspace/cloudfiles.php');
$auth = new CF_Authentication('xxx', 'yyyy');
$auth->authenticate();
$conn = new CF_Connection($auth,true);
$container = $conn->get_container('some container');
foreach ($lotsofitems as $onitem){
// check the record
// save the image to disk with cURL
// resize it into 4 more versions
// post it to rackspace
if(file_exists('/var/temp/'. $image_id . '_full'. $image_type)){
$object = $container->create_object($image_id . '_full' . $image_type);
$object->load_from_filename('/var/temp/'. $image_id . '_full' . $image_type);
unlink('/var/temp/'. $image_id . '_full' . $image_type); // remove the temp save
}
if(file_exists('/var/temp/'. $image_id . '_big'. $image_type)){
$object = $container->create_object($image_id . '_big' . $image_type);
$object->load_from_filename('/var/temp/'. $image_id . '_big' . $image_type);
unlink('/var/temp/'. $image_id . '_big' . $image_type); // remove the temp save
}
if(file_exists('/var/temp/'. $image_id . '_med'. $image_type)){
$object = $container->create_object($image_id . '_med' . $image_type);
$object->load_from_filename('/var/temp/'. $image_id . '_med' . $image_type);
unlink('/var/temp/'. $image_id . '_med' . $image_type); // remove the temp save
}
// delete the original
// repeat
}
After optimizing our parser, GD, etc, we've benchmarked the process and processing the image takes about 1 second but transferring the 5 image variations to Rackspace's is taking 2-5 seconds per item and at times spikes up to 10+.
- get image: 1341964436
- got image: 1341964436
- resized image: 1341964437
- clouded one image: 1341964446
- clouded image: 1341964448
- finished with image: 1341964448
Some additional points:
- Our processing servers are on Rackspace's cloud as well.
- There are 5 total image versions ranging from around 30kb to 2kb
- All images are saved to disk before the transfer and removed after
- Our containers [we use several overall but one per item] are CDN enabled
Does anyone have suggestions with bulk transfers to Rackspace? Should we be reconnecting after a certain duration / number of requests? Optimizing our connection some other way? Or is it just about forking the processes and running a lot of calls.