8

I have a django project which use django-storage over s3-boto.

Problem is that every file that is located on S3 unable to cache because the url is changed from each call.

here are two calls generated by django-storage :

https://my.s3.amazonaws.com/cache/user_6/profile_pic/profile_profile_picture_thumbnail.jpg?Signature=HlVSayUIJj6dMyk%2F4KBtFlz0uJs%3D&Expires=1364418058&AWSAccessKeyId=[awsaccesskey]     
https://my.s3.amazonaws.com/cache/user_6/profile_pic/profile_profile_picture_thumbnail.jpg?Signature=xh2VxKys0pkq7yHpbJmH000wkwg%3D&Expires=1364418110&AWSAccessKeyId=[awsaccesskey]

As you can see the signature is different. What can I do so it wont break my browser cache ?

Zulu
  • 8,765
  • 9
  • 49
  • 56
Nuno_147
  • 2,817
  • 5
  • 23
  • 36
  • Possible duplicate of [Why does S3 (using with boto and django-storages) give signed url even for public files?](http://stackoverflow.com/questions/16777900/why-does-s3-using-with-boto-and-django-storages-give-signed-url-even-for-publi) – pawciobiel Oct 29 '15 at 10:11

6 Answers6

10

In your settings, just add the following:

AWS_QUERYSTRING_AUTH = False

This will make sure that the URLs to the files are generated WITHOUT the extra parameters. Your URLs would look like:

https://my.s3.amazonaws.com/cache/user_6/profile_pic/profile_profile_picture_thumbnail.jpg
Deepak Prakash
  • 840
  • 1
  • 8
  • 13
  • thanks ! but if I would like to set some of my URL secured ? how can I do it ? – Nuno_147 Mar 29 '13 at 21:40
  • @Nuno_147 : Sorry, I had made a mistake in the original answer. It should be AWS_QUERYSTRING_AUTH = False. Updated the answer to reflect this. – Deepak Prakash Mar 30 '13 at 20:47
  • @Nuno_147 : By secure, do you mean you want to add the query string auth for only certain files? Or do u mean https? – Deepak Prakash Mar 30 '13 at 20:50
  • 1
    what are the security implications of removing the querystring? – ecoe Jul 04 '15 at 00:55
  • To add auth query string to some of the urls but not all create another storage instance and pass kwarg attr querystring_auth=False. – pawciobiel Oct 29 '15 at 10:09
  • The purpose of the querystring signature is for secure/private access. This solution makes everything public. If you want private access to files, you need to leave this true and cache the generated url instead. – gatlanticus Aug 29 '19 at 23:17
7

When AWS_QUERYSTRING_AUTH = True (which is the default), django will generate a temporary url each time we fetch the url.

If you don't want to generate a temporary url:

Add AWS_QUERYSTRING_AUTH = False to your settings.py

If you still want a temporary url:

Temporary urls will be valid for AWS_QUERYSTRING_EXPIRE seconds (3600 by default). So we can cache this temporary url (as long as we don't cache it longer than it's valid). This way - we can return the same url for subsequent page requests, allowing the client browser to fetch from their cache.

settings.py

# We subclass the default storage engine to add some caching
DEFAULT_FILE_STORAGE = 'project.storage.CachedS3Boto3Storage'

project/storage.py

import hashlib

from django.conf import settings
from django.core.cache import cache
from storages.backends.s3boto3 import S3Boto3Storage

class CachedS3Boto3Storage(S3Boto3Storage):
    """ adds caching for temporary urls """

    def url(self, name):
        # Add a prefix to avoid conflicts with any other apps
        key = hashlib.md5(f"CachedS3Boto3Storage_{name}".encode()).hexdigest()
        result = cache.get(key)
        if result:
            return result

        # No cached value exists, follow the usual logic
        result = super(CachedS3Boto3Storage, self).url(name)

        # Cache the result for 3/4 of the temp_url's lifetime.
        try:
            timeout = settings.AWS_QUERYSTRING_EXPIRE
        except:
            timeout = 3600
        timeout = int(timeout*.75)
        cache.set(key, result, timeout)

        return result
Aaron
  • 2,409
  • 29
  • 18
  • Getting the error "unicode-objects must be encoded before hashing" from line 13 key = hashlib.md5.... – Thorvald Jan 04 '22 at 21:42
  • 1
    Solved it by editing it to: key = hashlib.md5(f"CachedS3Boto3Storage_{name}".encode()).hexdigest() – Thorvald Jan 04 '22 at 21:50
4

Protect some file storage

Most of your media uploads – user avatars, for instance – you want to be public. But if you have some media that requires authentication before you can access it – say PDF resumes which are only accessible to members – then you don’t want S3BotoStorage’s default S3 ACL of public-read. Here we don’t have to subclass, because we can pass in an instance rather than refer to a class.

so first remove protection for all file fields in your settings and add cache control

AWS_HEADERS = {
    'Cache-Control': 'max-age=86400',
}

# By default don't protect s3 urls and handle that in the model
AWS_QUERYSTRING_AUTH = False

Then make the file field you need protected to use your custom protected path

from django.db import models
import storages.backends.s3boto

protected_storage = storages.backends.s3boto.S3BotoStorage(
  acl='private',
  querystring_auth=True,
  querystring_expire=600, # 10 minutes, try to ensure people won't/can't share
)

class Profile(models.Model):
  resume = models.FileField(
    null=True,
    blank=True,
    help_text='PDF resume accessible only to members',
    storage=protected_storage,
  )

But you also need to use your normal storage when you are in dev, and you are usually using local storage so this is how i personally went about it

if settings.DEFAULT_FILE_STORAGE == 'django.core.files.storage.FileSystemStorage':
    protected_storage = FileSystemStorage()
    logger.debug('Using FileSystemStorage for resumes files')
else:
    protected_storage = S3BotoStorage(
        acl='private',
        querystring_auth=True,
        querystring_expire=86400,  # 24Hrs, expiration try to ensure people won't/can't share after 24Hrs
    )
    logger.debug('Using protected S3BotoStorage for resumes files')

REF: https://tartarus.org/james/diary/2013/07/18/fun-with-django-storage-backends

Dr Manhattan
  • 13,537
  • 6
  • 45
  • 41
1

Your best bet is to subclass the Boto S3 Storage Backend and override the url method.

/project/root/storage.py

from django.conf import settings
from storages.backends.s3boto import S3BotoStorage

class S3Storage(S3BotoStorage):

    def url(self, name):
        name = self._clean_name(name)
        return '{0}{1}'.format(settings.MEDIA_URL, name)

/project/root/settings.py

MEDIA_URL = 'https://my.s3.amazonaws.com/'
DEFAULT_FILE_STORAGE = 'project.storage.S3Storage'
AWS_ACCESS_KEY_ID = '*******'
AWS_SECRET_ACCESS_KEY = '********'
AWS_STORAGE_BUCKET_NAME = 'your-bucket'

Just make sure your images are publicly readable.

krak3n
  • 948
  • 6
  • 13
0

If you care about using signed urls but still want to cache them until they expire, just use django's built-in caching:

from django.core.cache import cache

class MyModel(models.Model):
    #using django-storages with AWS_QUERYSTRING_AUTH=True
    media = models.FileField()

    @property
    def url(self):
        """
        Return signed url, re-using cached value if not expired.
        """
        #Refresh url if within n seconds of expiry to give clients
        #time to retrieve content from bucket.
        #Make sure AWS_QUERYSTRING_EXPIRE is sufficiently larger than n
        n = 30
        time_to_expiry = settings.AWS_QUERYSTRING_EXPIRE - n

        #Create a unique key for this instance's url in the cache
        key = '{0}{1}'.format(self.__class__.__name__, self.pk)

        url = cache.get(key)
        if not url:
            url = self.media.url    #refresh url via django-storages
            cache.set(key, url, time_to_expiry)

        return url
gatlanticus
  • 1,158
  • 2
  • 14
  • 28
0

Adding this line in your settings.py file will enable the caching for the image file.

AWS_S3_OBJECT_PARAMETERS = {
        'CacheControl': 'max-age=86400',
    }
Akhil S
  • 955
  • 11
  • 16