2

This is a question building upon How to send a pandas dataframe using POST method and receive it in Hug/other REST API...

  • How do I send binary data (the pickled Pandas Dataframe) using Django REST API to a models.BinaryField?

Below are my tries, but cannot get it to work.

import pandas as pd
df = pd.DataFrame({'a': [0, 1, 2, 3]})

import pickle
pickled = pickle.dumps(df)

import base64
pickled_b64 = base64.b64encode(pickled)

Want to send the pickled_b64 object, by POST via API to destination (www.foreignserver.com)

import requests
r = requests.post('http://www.foreignserver.com/api/DataFrame/', data = {'df_object':pickled_b64})

. .

On Server www.foreignserver.com

using for Django REST Framework v. 3.9.4

models.py

class DataFrame(models.Model):
    df_object = models.BinaryField(blank=True, null=True)

serializers.py

class DataFrameSerializer(serializers.HyperlinkedModelSerializer):
    class Meta:
        model = DataFrame
        fields = ('__all__')

views.py

class DataFrameView(viewsets.ModelViewSet):
    queryset = DataFrame.objects.all()
    serializer_class = DataFrameSerializer

Result:

  • ==> A post is created but with Null value, the framework does not save the data package / binary data. However, I am given a 201 response.

    [09/Sep/2019 13:23:53] "POST /api/DataFrame/ HTTP/1.1" 201 64

To check if it memoryview address actually is empty, I retrieved the presumed object and analysed it with:

print(list(object))
len(object)

both turned out as 0.

After some digging around I found that:

"BinaryField are not supported by Django REST framework. You'll need to write a serializer field class and declare it in a mapping to make this work." ref: Django Rest do not show postgresql binary field in view

Jaco
  • 1,564
  • 2
  • 9
  • 33
  • Note that pickle is not that safe, a malicious file may be generated by a attacker, consider this fact for production. – AviKKi Sep 10 '19 at 05:52

2 Answers2

3

After some work I am able to respond to my own question. Below, I make my own interpretations/analysis, but perhaps misunderstand what actually happens in some steps. Nevertheless, below works as intended and is a complete answer to the question.

1. Given no native (model) support for BinaryField within DRF, first step is to construct your own field:

class MyBinaryField(serializers.Field):
    def to_internal_value(self, obj):
        return base64.b64decode(obj)
'''
to_internal_value is obj to database and it seems DRF 
sees the POST as a string (despite being encoded as bytes, 
therefore, by decode the string, you get to 
the underlying bytes data (pickle.dumps).   
'''

    def to_representation(self, value):
        return base64.b64encode(value)

'''
to_representation is the visual feedback, and in order 
for being able to see the byte data one need to decode it.   
'''

2. Once you got your new Field, you implement it into a Serializer,

and define methods.

Please note that serializers.ModelSerializer wont work,

so you need to use serializers.Serializer

class DataFrameSerializer(serializers.Serializer):
    serializer_field_mapping = (
        serializers.ModelSerializer.serializer_field_mapping.copy()
    )
    serializer_field_mapping[models.BinaryField] = MyBinaryField

    df_object = MyBinaryField()

    def create(self, validated_data):
        """
        Create and return a new `DataFrame' instance, given the validated data.
        """
        return DataFrame.objects.create(**validated_data)

    def update(self, instance, validated_data):
        """
        Update and return an existing 'DataFrame' instance, given the validated data.
        """
        instance.df_object = validated_data.get('df_object', instance.df_object)
        instance.save()
        return instance

3. Finally, you define your view

class DataFrameView(viewsets.ModelViewSet):
    queryset = DataFrame.objects.all()
    serializer_class = DataFrameSerializer

4. Then, you can access and POST data through the API

import pickle
import requests
import base64
import pandas as pd

df = pd.DataFrame({'a': [0, 1, 2, 3]})
pickbytes = pickle.dumps(df)
b64_pickbytes = base64.b64encode(pickbytes)

url = 'http://localhost:8000/api/DataFrame/'
payload = {'df_object':b64_pickbytes}
r = requests.post(url=url, data=payload)

5. To retreive the data and re-create the DataFrame

 >>> new = DataFrame.objects.first()
 >>> byt = new.df_object
 >>> s = pickle.loads(byt)
 >>> s
    a
 0  0
 1  1
 2  2
 3  3

Helpful posts and docs related to the question:

[1] https://stackoverflow.com/a/33432733/10778349
[2] https://stackoverflow.com/a/31624941/10778349
[3] https://docs.python.org/3/library/stdtypes.html#memoryview
[4] https://www.django-rest-framework.org/api-guide/fields/#custom-fields
[5] https://www.django-rest-framework.org/tutorial/1-serialization/
Jaco
  • 1,564
  • 2
  • 9
  • 33
1

What does your serializer look like? It's possible that the binary is being saved (the response is 200) but your serialization doesn't know how to stringify a binary field. Please confirm the binary is being stored by inspecting the row in the database directly.

Looks like drf cannot handle BinaryField ootb. See How to use custom serializers fields in my HyeprlinkedModelSerializer

Try

# serializers.py
from django.db import models

class MyBinaryField(serializers.Field):
    def to_representation(self, obj):
        return base64.b64decode(obj)
    def to_internal_value(self, data):
        return base64.encodestring(data)

class DataFrameSerializer(serializers.HyperlinkedModelSerializer):
    serializer_field_mapping = (
        serializers.ModelSerializer.serializer_field_mapping.copy()
    )
    serializer_field_mapping[models.BinaryField] = MyBinaryField
    class Meta:
        model = DataFrame
        fields = ('__all__')
Harry Moreno
  • 10,231
  • 7
  • 64
  • 116
  • Hi Harry, thanks for your answer. I have tried various ways to see if it contains any value. I have seen I got a memory address allocated but when trying to refer to that address it seems to return nothing. I completed with this in my question above. – Jaco Sep 09 '19 at 16:08
  • use a db gui like dbeaver to inspect the df_object column on the new record. – Harry Moreno Sep 09 '19 at 16:20
  • Did this check with DBeaver (thanks, tried to do the same with phpPgAdmin but wasn't as clear as with DBeaver), confirms the entry is empty, see screenshot above – Jaco Sep 09 '19 at 16:36
  • Share your serializer and view. You can have a sanity check in your view by printing a string. Also what does adding `print('stringify binary', base64.b64decode(self.DfObj))` show? – Harry Moreno Sep 09 '19 at 17:45
  • I've included the serializer and view. Its a new area for me and a bit confusing. Perhaps I would need to write a Post view to get it to work? as in https://stackoverflow.com/questions/45205039/post-to-django-rest-framework. – Jaco Sep 09 '19 at 21:56
  • Why does your api respond with `"url": "http://www.foreignserver.com/api/DataFrame/17/",` that's not standard drf behavior. Try sticking to the defaults when starting out. There is https://devdocs.io/pandas~0.25/reference/api/pandas.dataframe.to_pickle and on your `DataFrameViewSet` add the method I'll add to my answer. – Harry Moreno Sep 10 '19 at 02:54