8

I have a bson formatted string in file

I want to read that file and get the encoded json.

I was looking into the example here:

>>> from bson import BSON
>>> bson_string = BSON.encode({"hello": "world"})
>>> bson_string
'\x16\x00\x00\x00\x02hello\x00\x06\x00\x00\x00world\x00\x00'
>>> bson_string.decode()
{u'hello': u'world'}

from http://docs.mongodb.org/meta-driver/latest/legacy/bson/

But what i have is say:

string = '\x16\x00\x00\x00\x02hello\x00\x06\x00\x00\x00world\x00\x00'

And now i want to parse this json? How do i do this? Thanks


Can you try to parse this bson formatted string:

s = """'\x93\x01\x00\x00\x02_id\x00\x1a\x00\x00\x00auromotiveengineering.com\x00\x04name_servers\x00_\x00\x00\x00\x020\x00\x17\x00\x00\x00ns-2.activatedhost.com\x00\x021\x00\x17\x00\x00\x00ns-1.activatedhost.com\x00\x022\x00\x17\x00\x00\x00ns-3.activatedhost.com\x00\x00\nreputation\x00\x04categories\x00\x05\x00\x00\x00\x00\x03host_act\x00\xd7\x00\x00\x00\x03bnMtMi5hY3RpdmF0ZWRob3N0LmNvbQ==\x00$\x00\x00\x00\x10seen_first\x00\x00,\xe7F\x10seen_last\x00\x80 \xebF\x00\x03bnMtMy5hY3RpdmF0ZWRob3N0LmNvbQ==\x00$\x00\x00\x00\x10seen_first\x00\x00,\xe7F\x10seen_last\x00\x80 \xebF\x00\x03bnMtMS5hY3RpdmF0ZWRob3N0LmNvbQ==\x00$\x00\x00\x00\x10seen_first\x00\x00,\xe7F\x10seen_last\x00\x80 \xebF\x00\x00\x00'"""

So this is what I did: give a jsonstring

   s = """'{ "_id" : "auromotiveengineering.com", "categories" : [ ], "host_act" : { "bnMtMi5hY3RpdmF0ZWRob3N0LmNvbQ==" : { "seen_first" : 1189555200, "seen_last" : 1189814400 }, "bnMtMS5hY3RpdmF0ZWRob3N0LmNvbQ==" : { "seen_first" : 1189555200, "seen_last" : 1189814400 }, "bnMtMy5hY3RpdmF0ZWRob3N0LmNvbQ==" : { "seen_first" : 1189555200, "seen_last" : 1189814400 } }, "name_servers" : [ \t"ns-2.activatedhost.com", \t"ns-1.activatedhost.com", \t"ns-3.activatedhost.com" ], "reputation" : null }"""

Now, loaded this string

jsn = json.loads(s)

bson_string = BSON.encode(jsn)

And then i copy paste bson_string 

so bson_string = """'\x93\x01\x00\x00\x02_id\x00\x1a\x00\x00\x00auromotiveengineering.com\x00\x04name_servers\x00_\x00\x00\x00\x020\x00\x17\x00\x00\x00ns-2.activatedhost.com\x00\x021\x00\x17\x00\x00\x00ns-1.activatedhost.com\x00\x022\x00\x17\x00\x00\x00ns-3.activatedhost.com\x00\x00\nreputation\x00\x04categories\x00\x05\x00\x00\x00\x00\x03host_act\x00\xd7\x00\x00\x00\x03bnMtMi5hY3RpdmF0ZWRob3N0LmNvbQ==\x00$\x00\x00\x00\x10seen_first\x00\x00,\xe7F\x10seen_last\x00\x80 \xebF\x00\x03bnMtMy5hY3RpdmF0ZWRob3N0LmNvbQ==\x00$\x00\x00\x00\x10seen_first\x00\x00,\xe7F\x10seen_last\x00\x80 \xebF\x00\x03bnMtMS5hY3RpdmF0ZWRob3N0LmNvbQ==\x00$\x00\x00\x00\x10seen_first\x00\x00,\xe7F\x10seen_last\x00\x80 \xebF\x00\x00\x00
"""

and for this when i try.. it throws an error :(

Another string where i have an error:

._idbrusselscityreporter.comcategorieshost_act�bnMzMC5kb21haW5jb250cm9sLmNvbQ==$seen_first�hLseen_last��NbnMyOS5kb21haW5jb250cm9sLmNvbQ==$seen_first�hLseen_last��Nname_serversA0ns30.domaincontrol.com1ns29.domaincontrol.com
frazman
  • 32,081
  • 75
  • 184
  • 269

1 Answers1

7

You can do this to initialize a BSON instance with a string:

>>> s = '\x16\x00\x00\x00\x02hello\x00\x06\x00\x00\x00world\x00\x00'
>>> bson_obj = BSON(s)
>>> bson_obj.decode()
{u'hello': u'world'}
Paulo Almeida
  • 7,803
  • 28
  • 36
  • 2
    It gives me an error, `objsize too large`. I don't know if that is really the case or just malformed. – Paulo Almeida Aug 28 '13 at 20:25
  • It worked for me. I wrote the bson string to a file, opened and read it back, decoded and got the original json. But I removed the initial `'` from your `jsonstring`, which is not closed. Was that a typo when you copied the string here or maybe that's the problem? Edit: It should be a typo, or it wouldn't have loaded. – Paulo Almeida Aug 28 '13 at 21:36
  • ehh.. can you take a look at this string.. :( the last one with weird characters in it :( – frazman Aug 28 '13 at 21:58
  • @Fraz, that last one doesn't look like json or bson. How did you generate it? Maybe you can manually turn it into a json string, the fields seem to be there. Or do you have many others like it? If that is the case, it would probably be better to try to understand the format and parse it somehow. – Paulo Almeida Aug 28 '13 at 22:03