4

I have a RESTful webservice running on Amazon EC2. Since my application needs to deal with large number of photos, I plan to put them on Amazon S3. So the URL for retrieving a photo from S3 could look like this:

http://johnsmith.s3.amazonaws.com/photos/puppy.jpg

Is there any way or necessity to cache the images on EC2? The pros and cons I can think of is: 1) Reduced S3 usage and cost with improved image fetching performance. However on the other hand EC2 cost can rise plus EC2 may not have the capability to handle the image cache due to bandwidth restrictions. 2) Increased development complexity cuz you need to check the cache first and ask S3 to transfer the image to EC2 and then transfer to the client.

I'm using the EC2 micro instance and feel it might be better not to do the image cache on EC2. But the scale might grow fast and eventually will need a image cache.(Am I right?) If cache is needed, is it better to do it on EC2, or on S3? (Is there a way for caching for S3?)

By the way, when the client uploads an image, should it be uploaded to EC2 or S3 directly?

John Rotenstein
  • 241,921
  • 22
  • 380
  • 470
Robin Sun
  • 93
  • 1
  • 2
  • 9
  • By number of reasons bespoke solution may suit you better. See your requirements and compare to CDN solutions that are available. – Anatoly Feb 27 '15 at 00:22

3 Answers3

2

Static vs dynamic

Generally speaking, here are the tiers:

best  CDN (cloudfront)
good  static hosting (S3)
okay  dynamic (EC2)

Why? There are a few reasons.

  • maintainability and scalability: cloudfront and S3 scale "for free". You don't need to worry about capacity or bandwidth or request rate.
  • price: approximately speaking, it's cheaper to use S3 than EC2.
  • latency: CDNs are located around the world, leading to shorter load times.

Caching

No matter where you are serving your static content from, proper use of the Cache-Control header will make life better. With that header you can tell a browser how long the content is good for. If it is something that never changes, you can instruct a browser to keep it for a year. If it frequently changes, you can instruct a browser to keep it for an hour, or a minute, or revalidate every time. You can give similar instructions to a CDN.

Here's a good guide, and here are some examples:

# keep for one year
Cache-Control: max-age=2592000

# keep for a day on a CDN, but a minute on client browsers
Cache-Control: s-maxage=86400, maxage=60

You can add this to pages served from your EC2 instance (no matter if it's nginx, Tornado, Tomcat, IIS), you can add it to the headers on S3 files, and CloudFront will use these values.

I would not pull the images from S3 to EC2 and then serve them. It's wasted effort. There are only a small number of use cases where that makes sense.

tedder42
  • 23,519
  • 13
  • 86
  • 102
2

Why bring EC2 into the equation? I strongly recommend using CloudFront for the scenario.

When you use CloudFront in conjunction with S3 as origin; the content gets distributed to 49 different locations worldwide ( as of count of edge locations worldwide today ) directly working out as a cache globally and the content being fetched from nearest location based on the latency to your end users.

The way you don't need to worry about the scale and performance of Cache and EC2 can straightforward offload this to CloudFront and S3.

Naveen Vijay
  • 15,928
  • 7
  • 71
  • 92
  • I have been using CloudFront. The only drawback I see is the length signed url's that CloudFront generates. And passing these too the client (Web and Android) is another overhead. Any ideas on how to overcome this or you don't think its an issue at all. – Harshit Nov 28 '17 at 13:55
  • The length of your cloudfront backed urls is inconsequential. You will find there are many utilities to aid towards this end that prevent you having to invent anything to sign your resources yourself. For me, i use django-storages which handles all of this for me. – Derek Adair Jun 02 '23 at 12:23
1

Few scenarios when EC2 caching instance:

  • your upload/download ratio is far from 50/50

  • you hit S3 limit 100req/sec

  • you need URL masking

  • you want to optimise kernel, TCP/IP settings, cache SSL session for clients

  • you want proper cache invalidating mechanism for all geo locations

  • you need 100% control where data is stored

  • you need to count number of requests

  • you have custom authentication mechanism

For number of reasons I recommend to take a look at Nginx S3 proxy.

Anatoly
  • 15,298
  • 5
  • 53
  • 77