0

We have an application that parses data from external sources and localizes it back, saving and resizing images as the final step of the process. Given the size of our processing [2 million images to date] we've been using Rackspace Files for hosting the data...

require('/var/libs/rackspace/cloudfiles.php');
$auth = new CF_Authentication('xxx', 'yyyy');
$auth->authenticate();
$conn = new CF_Connection($auth,true);
$container = $conn->get_container('some container');

foreach ($lotsofitems as $onitem){

    // check the record
    // save the image to disk with cURL
    // resize it into 4 more versions
    // post it to rackspace

    if(file_exists('/var/temp/'. $image_id . '_full'. $image_type)){
        $object = $container->create_object($image_id . '_full' . $image_type);
        $object->load_from_filename('/var/temp/'. $image_id . '_full' . $image_type);
        unlink('/var/temp/'. $image_id . '_full' . $image_type); // remove the temp save
    }

    if(file_exists('/var/temp/'. $image_id . '_big'. $image_type)){
        $object = $container->create_object($image_id . '_big' . $image_type);
        $object->load_from_filename('/var/temp/'. $image_id . '_big' . $image_type);
        unlink('/var/temp/'. $image_id . '_big' . $image_type); // remove the temp save
    }

    if(file_exists('/var/temp/'. $image_id . '_med'. $image_type)){
        $object = $container->create_object($image_id . '_med' . $image_type);
        $object->load_from_filename('/var/temp/'. $image_id . '_med' . $image_type);
        unlink('/var/temp/'. $image_id . '_med' . $image_type); // remove the temp save
    }

    // delete the original
    // repeat

}

After optimizing our parser, GD, etc, we've benchmarked the process and processing the image takes about 1 second but transferring the 5 image variations to Rackspace's is taking 2-5 seconds per item and at times spikes up to 10+.

  • get image: 1341964436
  • got image: 1341964436
  • resized image: 1341964437
  • clouded one image: 1341964446
  • clouded image: 1341964448
  • finished with image: 1341964448

Some additional points:

  1. Our processing servers are on Rackspace's cloud as well.
  2. There are 5 total image versions ranging from around 30kb to 2kb
  3. All images are saved to disk before the transfer and removed after
  4. Our containers [we use several overall but one per item] are CDN enabled

Does anyone have suggestions with bulk transfers to Rackspace? Should we be reconnecting after a certain duration / number of requests? Optimizing our connection some other way? Or is it just about forking the processes and running a lot of calls.

Ted S
  • 327
  • 2
  • 13
  • What size are the images you're uploading? Could be that... I doubt it though. Have you spoken to Rackspace Support? Might be worth logging a ticket or using the Live Chat. – ajtrichards Jul 11 '12 at 21:19
  • The images are fairly small... under 50kb for the full size, a couple kb for the thumbs. Rackspace support had a few ideas but those are in place and we're still slower than we'd like... – Ted S Jul 12 '12 at 01:01

1 Answers1

1

Have you tried using CloudFuse? It allows you to mount Rackspace CloudFiles buckets as mounts.

I have used this and it's pretty good - they guy who made it works for Rackspace.

http://sandeepsidhu.wordpress.com/2011/03/07/mounting-cloud-files-using-cloudfuse-into-ubuntu-10-10-v2/

ajtrichards
  • 29,723
  • 13
  • 94
  • 101
  • Very interesting. We'll take a look and see if the mount option creates a direct enough connection to speed things up... Thanks! – Ted S Jul 26 '12 at 04:32