4

I've got this model in my Django app:

class Image(models.Model):
    image_file = models.ImageField(
        upload_to='images/', 
        width_field='width',
        height_field='height'
    )
    width = models.PositiveIntegerField(
        blank = True, null = True,
        editable = False
    )
    height = models.PositiveIntegerField(
        blank = True, null = True,
        editable = False
    )

    sha1 = models.CharField(max_length=32, blank=True, editable=False)
    filesize = models.PositiveIntegerField(blank=True, null=True, editable=False)

I can now upload images through the Django admin site. And the width and height properties are saved in the database automatically when it's uploaded, because of the special ImageField parameters.

But I'd also like it to automatically work out the uploaded file's size and SHA-1 digest, and save those properties too. How would I do this?

callum
  • 34,206
  • 35
  • 106
  • 163

3 Answers3

4

Its been a while, but something like this should work:

import hashlib
class Image(models.Model):
#...
    def save(self, *args, **kwargs):
        super(Image, self).save(*args, **kwargs)
        f = self.image_file.open('rb')
        hash = hashlib.sha1()
        if f.multiple_chunks():
           for chunk in f.chunks():
              hash.update(chunk)
        else:    
              hash.update(f.read())
        f.close()
        self.sha1 =  hash.hexdigest()
        self.filesize = self.image_file.size 

EDIT: Added suggestion by Dan on reading by chunk. Default chunk size is 64KB.

Burhan Khalid
  • 169,990
  • 18
  • 245
  • 284
  • 1
    it would be better to read the file in blocks and feed those blocks to the hash function. `hasher = hashlib.sha1()` / `while True:` / `block = f.read(1<<20) # 1MB` / `if block == '': break` / `hasher.update(block)` this would avoid problems if the files to be hashed end up much larger than a few MB and your app breaks because you didn't limit the file size and someone uploaded a dvd image or something else. – Dan D. Jan 23 '12 at 10:09
  • 2
    I think you need to call `super(Image, self).save(*args, **kwargs)` at the end of the method again in order to write `self.sha1` and `self.filesize` to the DB! – Philipp der Rautenberg Jan 22 '13 at 10:18
3

Although Burhan Khalid has given the answer but I think its still part solution of the puzzle. It still doesn't solve the saving to DB part. Here is the complete solution which also uses the newer with clause to take advantage of python and Django's file context_manager(So no and file.close() required, it happens automatically):

import hashlib
class Image(models.Model):
#...
def save(self, *args, **kwargs):
    with self.image_file.open('rb') as f:
        hash = hashlib.sha1()
        if f.multiple_chunks():
        for chunk in f.chunks():
            hash.update(chunk)
        else:    
            hash.update(f.read())
        self.sha1 =  hash.hexdigest()
        self.filesize = self.image_file.size 
        super(Image, self).save(*args, **kwargs)

Please Note that super() is called within the with clause. This is important otherwise you will get an error: ValueError: I/O operation on closed file. as Django tries to read the closed file thinking its open when you have already closed it. It is also the last command to save everything we have updated to the database(This was left in the previous best answer, where you most probably have to call save() once again to really save those details)

Siddharth Pant
  • 665
  • 7
  • 11
1

I'm not sure if you can do it automatically. But an ImageField is also a FileField so you can always open the file and calculate the checksum using hashlib.sha1. You will have to read the file to calculate the checksum so you can sniff the size at the same time.

It has been a while since I have used Django's ORM, but I believe that there is a way to write a method that is called whenever the model instance is saved to or read from the underlying storage. This would be a good place to do the calculation.

D.Shawley
  • 58,213
  • 10
  • 98
  • 113