0

I'm intending to store Pandas DataFrames in MongoDB using the Python MongoEngine framework; coercing Pandas Dataframes to a Python Dict via df.to_list() and storing them as a nested Document attribute. I'm attempting to minimize the amount of code I have to write to make the round trip from Pandas DataFrame to BSON and back by using a custom field type called DataFrameField which is defined in this gist that coerces the pandas data frame to a python dict and back within the __set__ and __get__ methods.

This works great when setting the DataFrameField using dot notation, as in:

import pandas as pd
import numpy as np
from mongoengine import *

a_pandas_data_frame = pd.DataFrame({
    'goods': ['a', 'a', 'b', 'b', 'b'],
    'stock': [5, 10, 30, 40, 10],
    'category': ['c1', 'c2', 'c1', 'c2', 'c1'],
    'date': pd.to_datetime(['2014-01-01', '2014-02-01', '2014-01-06', '2014-02-09', '2014-03-09'])
})

class my_data(Document):
        data_frame = DataFrameField() # defined in the referenced gist

foo = my_data()
foo.data_frame = a_pandas_data_frame

but if I pass a_pandas_data_frame it to the constructor, I get:

>>> bar = my_data(data_frame = a_pandas_data_frame)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Users\MPGWRK-006\Anaconda2\lib\site-packages\mongoengine\base\document.py", line 116, in __init__
    setattr(self, key, value)
  File "C:\Users\MPGWRK-006\Anaconda2\lib\site-packages\mongoengine\base\document.py", line 186, in __setattr__
    super(BaseDocument, self).__setattr__(name, value)
  File "<stdin>", line 18, in __set__
ValueError: value is not a pandas.DataFrame instance

If i add a print statement like print value to the __set__ method, and call the constructor, it prints:

['category', 'date', 'goods', 'stock']

which is the list of column names of the data frame (i.e. list(a_pandas_data_frame.columns)). Is there any way to prevent the MongoEngine Document Constructor from passing something other than the object passed on to the __set__ method?

Thanks!

PS, I also asked this question at the [MongoEngine Repo] (https://github.com/MongoEngine/mongoengine/issues/1597) but there are about 300 open issues, so I'm not sure I expect a response on that forum any time soon...

Jthorpe
  • 9,756
  • 2
  • 49
  • 64

1 Answers1

1

Digging through the source it appears you need to define to_python method on your DataFrameField field, else it will fall back to mongoengine.fields.DictField's to_python method.

mongoengine.fields.DictField's to_python method is basically ComplexBaseField's to_python method. This method on receiving a DataFrame decides that the object is sort of a list and returns the values obtained from enumerating DataFrame instance.

And here is the part that calls to_python on the field object.

if key in self._fields or key in ('id', 'pk', '_cls'):
    if __auto_convert and value is not None:
        field = self._fields.get(key)
        if field and not isinstance(field, FileField):
            value = field.to_python(value)

Hence, in your case you could simply define it as:

def to_python(self, value):
    return value
Ashwini Chaudhary
  • 244,495
  • 58
  • 464
  • 504