3

Given a Django model likeso:

from django.db import models

class MyModel(models.Model):
    textfield = models.TextField()

How can one automatically compress textfield (e.g. with zlib) on save() and decompress it when the property textfield is accessed (i.e. not on load), with a workflow like this:


m = MyModel()
textfield = "Hello, world, how are you?"
m.save() # compress textfield on save
m.textfield # no decompression
id = m.id()

m = MyModel.get(pk=id) # textfield still compressed
m.textfield # textfield decompressed

I'd be inclined to think that you would overload MyModel.save, but I don't know the pattern for in-place modification of the element when saving. I also don't know the best way in Django to decompress when the field when it's accessed (overload __getattr__?).

Or would a better way to do this be to have a custom field type?

I'm certain I've seen an example of almost exactly this, but alas I've not been able to find it recently.

Thank you for reading – and for any input you may be able to provide.

Brian M. Hunt
  • 81,008
  • 74
  • 230
  • 343

4 Answers4

2

Custom field types are definitely the way to go here. This is the only reliable way to ensure that the field is compressed on save and decompressed on load. Make sure you set the metaclass as described in your link.

Daniel Roseman
  • 588,541
  • 66
  • 880
  • 895
  • 1
    Absolutely agree. They are easy to create and can simplify any number of common problems. Google on "django strip charfield" for several takes on a simple extension to models.CharField(), namely stripping leading/trailing spaces. Frankly, I'm surprised it never made it into trunk. – Peter Rowell Apr 01 '10 at 01:21
2

You need to implement to_python and get_prep_value in your custom field type to respectively decompress and compress your data.

Pierre-Jean Coudert
  • 9,109
  • 10
  • 50
  • 59
1

I guess it's worth mentioning that PostgreSQL compresses by default for all string types: Text compression in PostgreSQL

So maybe the answer is: Don't?

mlissner
  • 17,359
  • 18
  • 106
  • 169
0

Also see https://djangosnippets.org/snippets/2014/ Seems a bit easier... Still just a TextField under the hood.

class CompressedTextField(models.TextField):
    """
    model Fields for storing text in a compressed format (bz2 by default)
    """
    __metaclass__ = models.SubfieldBase

    def to_python(self, value):
        if not value:
            return value

        try:
            return value.decode('base64').decode('bz2').decode('utf-8')
        except Exception:
            return value

    def get_prep_value(self, value):
        if not value:
            return value

        try:
            value.decode('base64')
            return value
        except Exception:
            try:
                tmp = value.encode('utf-8').encode('bz2').encode('base64')
            except Exception:
                return value
            else:
                if len(tmp) > len(value):
                    return value

                return tmp
Kevin Parker
  • 16,975
  • 20
  • 76
  • 105
  • This may theoretically answer the question, but it would be best to include the essential parts of the answer here for future users, and provide the link for reference. [Link-dominated answers](//meta.stackexchange.com/questions/8231) can become invalid through [link rot](//en.wikipedia.org/wiki/Link_rot). – Mogsdad May 04 '16 at 20:07