9

How can I ensure that my custom field's *to_python()* method is only called when the data in the field has been loaded from the DB?

I'm trying to use a Custom Field to handle the Base64 Encoding/Decoding of a single model property. Everything appeared to be working correctly until I instantiated a new instance of the model and set this property with its plaintext value...at that point, Django tried to decode the field but failed because it was plaintext.

The allure of the Custom Field implementation was that I thought I could handle 100% of the encoding/decoding logic there, so that no other part of my code ever needed to know about it. What am I doing wrong?

(NOTE: This is just an example to illustrate my problem, I don't need advice on how I should or should not be using Base64 Encoding)

def encode(value):
    return base64.b64encode(value)

def decode(value):
    return base64.b64decode(value)


class EncodedField(models.CharField):
    __metaclass__ = models.SubfieldBase

    def __init__(self, max_length, *args, **kwargs):
        super(EncodedField, self).__init__(*args, **kwargs)

    def get_prep_value(self, value):
        return encode(value)

    def to_python(self, value):
        return decode(value)

class Person(models.Model):
    internal_id = EncodedField(max_length=32)

...and it breaks when I do this in the interactive shell. Why is it calling to_python() here?

>>> from myapp.models import *
>>> Person(internal_id="foo")
Traceback (most recent call last):
  File "<console>", line 1, in <module>
  File "/usr/local/lib/python2.6/dist-packages/django/db/models/base.py", line 330, in __init__
    setattr(self, field.attname, val)
  File "/usr/local/lib/python2.6/dist-packages/django/db/models/fields/subclassing.py", line 98, in __set__
    obj.__dict__[self.field.name] = self.field.to_python(value)
  File "../myapp/models.py", line 87, in to_python
    return decode(value)
  File "../myapp/models.py", line 74, in decode
    return base64.b64decode(value)
  File "/usr/lib/python2.6/base64.py", line 76, in b64decode
    raise TypeError(msg)
TypeError: Incorrect padding

I had expected I would be able to do something like this...

>>> from myapp.models import *
>>> obj = Person(internal_id="foo")
>>> obj.internal_id
'foo'
>>> obj.save()
>>> newObj = Person.objects.get(internal_id="foo")
>>> newObj.internal_id
'foo'
>>> newObj.internal_id = "bar"
>>> newObj.internal_id
'bar'
>>> newObj.save()

...what am I doing wrong?

Mechanical snail
  • 29,755
  • 14
  • 88
  • 113
Adam Levy
  • 689
  • 1
  • 8
  • 10

3 Answers3

4

(from http://davidcramer.posterous.com/code/181/custom-fields-in-django.html
and https://docs.djangoproject.com/en/dev/howto/custom-model-fields/#converting-database-values-to-python-objects)

It seems that you need to be able to test if it is an instance and the problem with that is they are the same type (string vs b64 encoded string).So unless you can detirmine the difference I would suggest making sure you always:

Person(internal_id="foo".encode('base64', 'strict'))

or

Person(internal_id=base64.b64encod("foo"))

or some such encoding.

EDIT:- i was looking at https://github.com/django-extensions/django-extensions/blob/f278a9d91501933c7d51dffc2ec30341a1872a18/django_extensions/db/fields/encrypted.py and thought you could do something similar.

James Khoury
  • 21,330
  • 4
  • 34
  • 65
  • +1 for the gang-extensions citation -- it's not just limited to Django fields, the issue can arise when writing properties with descriptor `__set__`/`__get__` methods... That prefix trick may be hacky but it's quite effective. – fish2000 Sep 19 '12 at 12:57
  • +1 Thnx @James Khoury ...your suggested [edit source code](https://github.com/django-extensions/django-extensions/blob/f278a9d91501933c7d51dffc2ec30341a1872a18/django_extensions/db/fields/encrypted.py#L56) is very useful for me... – Pradip Kachhadiya May 02 '21 at 20:50
3

I have the exact same problem, but with JSON data. I want to store data in the database in JSON format. However if you try to store an already serialized JSON object, it will be returned deserialized. So the issue is that what comes in, is not always what comes out. Especially if you try to store a number as a string, it will be returned as an int or float, since it is deserialized by to_python before being stored.

The solution is simple, albeit not too elegant. Simply make sure to store the "type" of data along with the data, in my case, it is JSON data, so I prefix it with "json:", and thus you always know if the data is coming from the database.

def get_prep_value(self, value):
    if value is not None:
        value = "json:"+json.dumps(value)
    return value
def to_python(self, value):
    if type(value) in (str, unicode) and value.startswith("json:"):
        value = value[5:]
        try:
            return json.loads(value)
        except:
            # If an error was raised, just return the plain value
            return value
    else:
        return value

That being said, it's annoying that you can't expect consistent behavior, or that you can't tell whether to_python is operating on a user set value or a value from the DB.

zeraien
  • 165
  • 11
  • 1
    darn! this is the only solution I was looking for. It turns out `to_python()` is rampant in django source code and it's called everywhere like a mess. Wtf to call the same function in form->db and db-> form? Who designed this? – est Jan 06 '14 at 04:09
0

Do you only get the TypeError when you first assign a value to the field? You could just write a try/except around it:

def to_python(self, value):
  try:
   return decode(value)
  except TypeError:
   return value

It's not the cleanest solution, but you could try that and see if it lets you work with the field the way you're expecting to.

girasquid
  • 15,121
  • 2
  • 48
  • 58
  • Yeah, that occurred to me as well...I was hoping for a cleaner solution. I'm pretty new to Django in general, and specifically this is my first time using a custom field, so my assumption was that either I'm missing something in the declaration of my custom field, or that there's a completely different, more appropriate way to meet my needs. – Adam Levy Dec 22 '10 at 15:14
  • 1
    Turns out this was actually the only solution I could find. Disappointed there's not a better way to do this. – Adam Levy Dec 24 '10 at 20:40
  • 4
    This will fail if the data you store happens to be valid Base64 input (e.g. `b'abcd'`). Then no `TypeError` is raised, and the result is wrong. – Mechanical snail Aug 09 '11 at 00:20