2

I'm doing a rather simple insert into a local MongoDB sourced from of a Python pandas DataFrame. Essentially I'm calling datframe.loc[n].to_dict() and getting my dictionary directly from the df. All is well so far until I attempt the insert, where I'm getting a 'Cannot encode object'. Looking at the dict directly showed that everything looked good but then (while writing this question) it dawned me to check each type in the dict and found that a long ID number had converted to a numpy.int64 instead of a simple int (which when I created the dict manually as an int would insert fine).

So, I was unable to find anything within the pandas documentation on adding arguments to the to_dict that would allow me to override this behavior and while there are brute force methods to fixing this issue, there must be a bit more eloquent way to sort this issue without resorting to that sort of thing.

Question is then, how to convert a row of a dataframe to a dict for insertion into a MongoDB, ensuring I am using only acceptable content types ... OR, can I back up further here and use a simpler approach to get each row of a dataframe to be a document within Mongo?

Thanks

As requested, here is an addendum to the post with a sample of the data I am using.

{'Account Created': 'about 3 hours ago',
 'Followers': 13,
 'Following': 499,
 'Screen Name': 'XXXXXXXXXX',
 'Status': 'Alive',
 'Tweets': 12,
 'Twitter ID': 0000000000L}

This directly from the to_dict output that faulted on insert. I copied this directly into a 'test' dict and that worked perfectly fine. If I print out values of each of the dicts I get the following...

to_dict = ['Alive', 'a_aheref77', 'about 3 hours ago', 12, 13, 499, 0000000000L, ObjectId('551bd8cfae89e9370851aa64')]

test = ['Alive', 'XXXXXXXX', 'about 3 hours ago', 499, 13, 12, 0000000000, ObjectId('551bd6fdae89e9370851aa63')]

The only difference (as far as I can tell) is the Long int, which interestingly enough, when I did the Mongo insert it shows that field as being 'Number Long' within the document. Hope this help clarify som.

Phillip Cloud
  • 24,919
  • 11
  • 68
  • 88
Thatch
  • 157
  • 1
  • 2
  • 11

2 Answers2

3

Take a look at the odo library. In particular, the mongodb docs. Pandas isn't likely to grow any kind of to_mongo methods in the near future so Odo is where this sort of functionality should go. Here's an example with a simple DataFrame:

In [13]: import pandas as pd

In [14]: from odo import odo

In [15]: df = pd.DataFrame({'a': [1, 2, 3], 'b': list('abc')})

In [17]: m = odo(df, 'mongodb://localhost/db::t')

In [18]: list(m.find())
Out[18]:
[{u'_id': ObjectId('551bfb20362e696200d568d9'), u'a': 1, u'b': u'a'},
 {u'_id': ObjectId('551bfb20362e696200d568da'), u'a': 2, u'b': u'b'},
 {u'_id': ObjectId('551bfb20362e696200d568db'), u'a': 3, u'b': u'c'}]

You can get the required deps and odo by doing

conda install odo pymongo --channel blaze

or

pip install odo
Phillip Cloud
  • 24,919
  • 11
  • 68
  • 88
  • Thanks Phillip. I've not got my head completely around the odo library yet, but it did the trick in this instance regardless. I appreciate your help. – Thatch Apr 02 '15 at 06:09
-1

Python integers are stored as arbitrary precision numbers, which is not supported by Mongodb. You need to convert them into normal int64 or string objects first. When you manually copy & paste the code it worked probably because Python interpreter converted the integers properly into int64.

Xiaoji Liu
  • 149
  • 1
  • 4