10

I am using Amazon S3 to store static files for a Django project, but collectstatic is not finding updated files - only new ones.

I've been looking for an answer for ages, and my guess is that I have something configured incorrectly. I followed this blog post to help get everything set up.

I also ran into this question which seems identical to my problem, but I have tried all the solutions already.

I even tried using this plugin which is suggested in this question.

Here is some information that might be useful:

settings.py

...
STATICFILES_FINDERS = (
    'django.contrib.staticfiles.finders.FileSystemFinder',
    'django.contrib.staticfiles.finders.AppDirectoriesFinder',
    'django.contrib.staticfiles.finders.DefaultStorageFinder',
)
...
# S3 Settings
AWS_STORAGE_BUCKET_NAME = os.environ['AWS_STORAGE_BUCKET_NAME']
STATICFILES_STORAGE = 'storages.backends.s3boto.S3BotoStorage'
S3_URL = 'http://%s.s3.amazonaws.com/' % AWS_STORAGE_BUCKET_NAME
STATIC_URL = S3_URL
AWS_PRELOAD_METADATA = False

requirements.txt

...
Django==1.5.1
boto==2.10.0
django-storages==1.1.8
python-dateutil==2.1

Edit1:

I apologize if this question is too unique to my own circumstances to be of any help to a large audience. Nonetheless - this is has been hampering my productivity for a long time and I have wasted many hours looking for solutions, so I am starting a bounty to reward anyone who can help troubleshoot this problem.

Edit2:

I just ran across a similar problem somewhere. I am in a different timezone than the location of my AWS bucket. If by default collectstatic uses time stamp, could this interfere with the process?

Thanks

Community
  • 1
  • 1
Joker
  • 2,119
  • 4
  • 27
  • 38

5 Answers5

8

I think I solved this problem. Like you, I have spent so many hours on this problem. I am also on the bug report you found on bitbucket. Here is what I just accomplished.

I had

django-storages==1.1.8
Collectfast==0.1.11

This does not work at all. Delete everything for once does not work either. After that, it cannot pick up modifications and refuse to update anything.

The problem is with our time zone. S3 will say the files it has is last modified later than the ones we want to upload. django collectstatic will not try to copy the new ones over at all. And it will call the files "unmodified". For example, here is what I see before my fix:

Collected static files in 0:00:45.292022.
Skipped 407 already synced files.
0 static files copied, 1 unmodified.

My solution is "To the hell with modified time!". Besides the time zone problems that we are solving here, what if I made a mistake and need to roll back? It will refuse to deploy the old static files and leave my website broken.

Here is my pull request to Collectfast https://github.com/FundedByMe/collectfast/pull/11 . I still left a flag so if you really want to check modified time, you can still do it. Before it got merged, just use my code at https://github.com/sunshineo/collectfast

You have a nice day!

--Gordon PS: Stayed up till 4:40am for this. My day is ruined for sure.

Gordon Sun
  • 278
  • 1
  • 19
  • 1
    It is worth noting that this feature landed in collectfast eventually. So no need to use sunshineo's fork. – Sethish Apr 30 '14 at 18:08
6

After hours of digging around, I found this bug report.

I changed my requirements to revert to a previous version of Django storages.

django-storages==1.1.5
Maxime Lorant
  • 34,607
  • 19
  • 87
  • 97
Joker
  • 2,119
  • 4
  • 27
  • 38
  • 1.1.5 started deleting my media files as well. Good thing that I was testing it on dev! – AliBZ Jul 30 '16 at 00:15
3

You might want to consider using this plugin written by antonagestam on Github: https://github.com/FundedByMe/collectfast

It compares the checksum of the files, which is a guaranteed way of determining when a file has changed. It's the accepted answer at this other stackoverflow question: Faster alternative to manage.py collectstatic (w/ s3boto storage backend) to sync static files to s3?

Community
  • 1
  • 1
sabreshack
  • 558
  • 1
  • 4
  • 11
  • 1
    This still didn't work. Even while using it, I can only update a file by manually deleting it first on S3. On the bright side, this did improve the speed significantly, but that wasn't much of a prooblem. – Joker Aug 30 '13 at 04:45
  • Hmm, perhaps you might need to delete all your existing files that have been collected via previous calls to 'collectstatic', and then use this new collectstatic command so that all the files are newly collected based on their md5 checksum. – sabreshack Aug 30 '13 at 17:57
  • The reason is that when this new collecstatic command uploads files onto S3, it computes and stores the md5 checksum of the file, so that later it can compare it to new local versions of the same file. If the file was uploaded using the old collecstatic command (which does not compute the checksum), then there is no way for the new collectstatic command to compare the checksums because it can't compute the checksum for the file that already resides on S3. – sabreshack Aug 30 '13 at 18:01
  • You'll only have to delete things on S3 ONCE, just to get all the checksums. In subsequent collections, you can just run the collectstatic command and it'll upload the updated files. – sabreshack Aug 30 '13 at 18:33
  • Right, what you are saying makes sense, that is how it should work too. I have deleted the files over and over, and they are not being uploaded with the correct checksum when using version 1.1.8 (or I assume this because it does not work with this version). Everything is working for now with v1.1.5 - I'd love to give you the bounty but I don't want to mislead others with the same problem I am having. Thanks for referring me to the plugin though, it's quite useful. – Joker Aug 31 '13 at 00:57
  • It would be interesting to know why it could not detect differences in the checksums. I ran a test after editing the file, and checked the md5 checksum before and after - they were in fact different. I was using a linux command to determine this, and I haven't looked in to what the plugin uses to calculate the checksum. – Joker Aug 31 '13 at 01:12
  • Hmm, that's weird that your static file collection sped up significantly, even though it is still not uploading only the recently modified files. How else could it speed it up? – sabreshack Sep 01 '13 at 05:51
  • @Joker It might be helpful to run collectstatic with --verbosity=3, that will show you on a more granular level which files are skipped and which are copied. – antonagestam Apr 14 '14 at 20:04
2

There are some good answers here but I spent some time on this today so figured I'd contribute one more in case it helps someone in the future. Following advice found in other threads I confirmed that, for me, this was indeed caused by a difference in time zone. My django time wasn't incorrect but was set to EST and S3 was set to GMT. In testing, I reverted to django-storages 1.1.5 which did seem to get collectstatic working. Partially due to personal preference, I was unwilling to a) roll back three versions of django-storages and lose any potential bug fixes or b) alter time zones for components of my project for what essentially boils down to a convenience function (albeit an important one).

I wrote a short script to do the same job as collectstatic without the aforementioned alterations. It will need a little modifying for your app but should work for standard cases if it is placed at the app level and 'static_dirs' is replaced with the names of your project's apps. It is run via terminal with 'python whatever_you_call_it.py -e environment_name (set this to your aws bucket).

import sys, os, subprocess
import boto3
import botocore
from boto3.session import Session
import argparse
import os.path, time
from datetime import datetime, timedelta
import pytz

utc = pytz.UTC
DEV_BUCKET_NAME = 'dev-homfield-media-root'
PROD_BUCKET_NAME = 'homfield-media-root'
static_dirs = ['accounts', 'messaging', 'payments', 'search', 'sitewide']

def main():
    try: 
        parser = argparse.ArgumentParser(description='Homfield Collectstatic. Our version of collectstatic to fix django-storages bug.\n')
        parser.add_argument('-e', '--environment', type=str, required=True, help='Name of environment (dev/prod)')
        args = parser.parse_args()
        vargs = vars(args)
        if vargs['environment'] == 'dev':
            selected_bucket = DEV_BUCKET_NAME
            print "\nAre you sure? You're about to push to the DEV bucket. (Y/n)"
        elif vargs['environment'] == 'prod':
            selected_bucket = PROD_BUCKET_NAME
            print "Are you sure? You're about to push to the PROD bucket. (Y/n)"
        else:
            raise ValueError

        acceptable = ['Y', 'y', 'N', 'n']
        confirmation = raw_input().strip()
        while confirmation not in acceptable:
            print "That's an invalid response. (Y/n)"
            confirmation = raw_input().strip()

        if confirmation == 'Y' or confirmation == 'y':
            run(selected_bucket)
        else:
            print "Collectstatic aborted."
    except Exception as e:
        print type(e)
        print "An error occured. S3 staticfiles may not have been updated."


def run(bucket_name):

    #open session with S3
    session = Session(aws_access_key_id='{aws_access_key_id}',
        aws_secret_access_key='{aws_secret_access_key}',
        region_name='us-east-1')
    s3 = session.resource('s3')
    bucket = s3.Bucket(bucket_name)

    # loop through static directories
    for directory in static_dirs:
        rootDir = './' + directory + "/static"
        print('Checking directory: %s' % rootDir)

        #loop through subdirectories
        for dirName, subdirList, fileList in os.walk(rootDir):
            #loop through all files in subdirectory
            for fname in fileList:
                try:
                    if fname == '.DS_Store':
                        continue

                    # find and qualify file last modified time
                    full_path = dirName + "/" + fname
                    last_mod_string = time.ctime(os.path.getmtime(full_path))
                    file_last_mod = datetime.strptime(last_mod_string, "%a %b %d %H:%M:%S %Y") + timedelta(hours=5)
                    file_last_mod = utc.localize(file_last_mod)

                    # truncate path for S3 loop and find object, delete and update if it has been updates
                    s3_path = full_path[full_path.find('static'):]
                    found = False
                    for key in bucket.objects.all():
                        if key.key == s3_path:
                            found = True 
                            last_mode_date = key.last_modified
                            if last_mode_date < file_last_mod:
                                key.delete()
                                s3.Object(bucket_name, s3_path).put(Body=open(full_path, 'r'), ContentType=get_mime_type(full_path))
                                print "\tUpdated : " + full_path
                    if not found:
                        # if file not found in S3 it is new, send it up
                        print "\tFound a new file. Uploading : " + full_path
                        s3.Object(bucket_name, s3_path).put(Body=open(full_path, 'r'), ContentType=get_mime_type(full_path))
                except:
                    print "ALERT: Big time problems with: " + full_path + ". I'm bowin' out dawg, this shitz on u." 


def get_mime_type(full_path):
    try:
        last_index = full_path.rfind('.')
        if last_index < 0:
            return 'application/octet-stream'
        extension = full_path[last_index:]
        return {
            '.js' : 'application/javascript',
            '.css' : 'text/css',
            '.txt' : 'text/plain',
            '.png' : 'image/png',
            '.jpg' : 'image/jpeg',
            '.jpeg' : 'image/jpeg',
            '.eot' : 'application/vnd.ms-fontobject',
            '.svg' : 'image/svg+xml',
            '.ttf' : 'application/octet-stream',
            '.woff' : 'application/x-font-woff',
            '.woff2' : 'application/octet-stream'
        }[extension]
    except:
        'ALERT: Couldn\'t match mime type for '+ full_path + '. Sending to S3 as application/octet-stream.'

if __name__ == '__main__':
    main()
RyCSmith
  • 51
  • 5
0

I had a similar problem pushing new files to a S3 bucket (previously working well), but is not problem about django or python, on my end I fixed the issue when I deleted my local repository and cloned it again.