5

I have dumped a mongodb collection using the mongodump command. The output is a dump directory which has these files:

dump/
    |___coll.bson
    |___coll.metadata.json

How can I open the exported files to a array of dictionaries that work in python? I tried the following and none worked:

with open('dump/coll.bson', 'rb') as f:
    coll_raw = f.read()
import json
coll = json.loads(coll_raw)

# Using pymongo
from bson.json_util import loads
coll = loads(coll_raw)

ValueError: No JSON object could be decoded
CentAu
  • 10,660
  • 15
  • 59
  • 85

2 Answers2

10

You should try:

from bson import BSON
with open('dump/coll.bson', 'rb') as f:
    coll_raw = f.read()

coll = bson.decode_all(coll_raw) 
Yash Mehrotra
  • 3,032
  • 1
  • 21
  • 24
  • This probably means that your BSON is incorrect, can you send me a sample BSON object that you are trying to decode ? – Yash Mehrotra Dec 16 '15 at 19:18
  • The bson file is the dump I got with `mongodump`. The file is huge. Let me see if I can replicate the error with a small database. – CentAu Dec 16 '15 at 19:20
  • Did you try running `BSON.is_valid(coll_row)`? – Quirk Dec 16 '15 at 19:29
  • @YashMehrotra here's the file: https://www.dropbox.com/s/6yyssja0la0ctln/dump.zip?dl=0 Direct output of mongodump – CentAu Dec 16 '15 at 19:31
  • @Quirk `object 'BSON' has no attribute 'is_valid'` – CentAu Dec 16 '15 at 19:33
  • @CentAu My Bad. Try `bson.is_valid(coll_raw)`. Ref: https://api.mongodb.org/python/current/api/bson/ – Quirk Dec 16 '15 at 19:36
  • @Quirk It returns false. But the file is the output of mongodb. It this doesn't return a valid bson, how can I get a json/bson dump from mongo that will work in python? – CentAu Dec 16 '15 at 19:40
  • @CentAu Looks like when you took the dump it got corrupted. Have you tried any other collection or redumping? – Quirk Dec 16 '15 at 19:41
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/98163/discussion-between-quirk-and-centau). – Quirk Dec 16 '15 at 19:43
  • Hi @CentAu , I decoded your bson. Please check the edited answer. – Yash Mehrotra Dec 16 '15 at 19:46
0

I know this was answered a long time ago, but you could try decoding each document separately and then you'd know which doc is causing the problem.

I use this library: https://github.com/bauman/python-bson-streaming

from bsonstream import KeyValueBSONInput
f = open("restaurants.bson", 'rb')
stream = KeyValueBSONInput(fh=f)
for dict_data in stream:
    print dict_data
f.close()

I see 25359 records which all seem to decode to something like:

{u'_id': ObjectId('5671bb2e111bb7b9a7ce4d9a'),
 u'address': {u'building': u'351',
              u'coord': [-73.98513559999999, 40.7676919],
              u'street': u'West   57 Street',
              u'zipcode': u'10019'},
 u'borough': u'Manhattan',
 u'cuisine': u'Irish',
 u'grades': [{u'date': datetime.datetime(2014, 9, 6, 0, 0),
              u'grade': u'A',
              u'score': 2},
             {u'date': datetime.datetime(2013, 7, 22, 0, 0),
              u'grade': u'A',
              u'score': 11},
             {u'date': datetime.datetime(2012, 7, 31, 0, 0),
              u'grade': u'A',
              u'score': 12},
             {u'date': datetime.datetime(2011, 12, 29, 0, 0),
              u'grade': u'A',
              u'score': 12}],
 u'name': u'Dj Reynolds Pub And Restaurant',
 u'restaurant_id': u'30191841'}
bauman.space
  • 1,993
  • 13
  • 15