We are using PHP with CodeIgniter to import millions of images from hundreds of sources, resizing them locally and then uploading the resized version to Amazon S3. The process is however taking much longer than expected, and we're looking for alternatives to speed things up. For more details:
- A lookup is made in our MySQL database table for images which have not yet been resized. The result is a set of images.
- Each image is imported individually using cURL, and temporarily hosted on our server during processing. They are imported locally because the library doesn't allow resizing/cropping of external images. According to some tests the speed difference when importing from different external sources have been between 80-140 seconds (for the entire process, using 200 images per test), so the external source can definitely slow things down.
- The current image is resized using the image_moo library, which creates a copy of the image
- The resized image is uploaded to Amazon S3 using a CodeIgniter S3 library
- The S3 URL for the new resized image is then saved in the database table, before starting with the next image
The process is taking 0.5-1 second per image, meaning all current images would take a month to resize and upload to S3. The major problem with that is that we are constantly adding new sources for images, and expect to have at least 30-50 million images before the end of 2011, compared to current 4 million at the start of May.
I have noticed one answer in StackOverflow which might be a good complement to our solution, where images are resized and uploaded on the fly, but since we don't want any unnecessary delay when people visit pages, we need to make certain that as many images as possible are already uploaded. Besides this, we want multiple size formats of the images, and currently only upload the most important one because of this speed issue. Ideally, we would have at least three size formats (for example one thumbnail, one normal and one large) for each imported image.
Someone suggested making bulk uploads to S3 a few days ago - any experience in how much this could save would be helpful.
Replies to any part of the question would be helpful if you have some experience of similar process. Part of the code (simplified)
$newpic=$picloc.'-'.$width.'x'.$height.'.jpg';
$pic = $this->image_moo
->load($picloc.'.jpg')
->resize($width,$height,TRUE)
->save($newpic,'jpg');
if ($this->image_moo->errors) {
// Do stuff if something goes wrong, for example if image no longer exists - this doesn't happen very often so is not a great concern
}
else {
if (S3::putObject(
S3::inputFile($newpic),
'someplace',
str_replace('./upload/','', $newpic),
S3::ACL_PUBLIC_READ,
array(),
array(
"Content-Type" => "image/jpeg",
)))
{ // save URL to resized image in database, unlink files etc, then start next image