7

I have a small json file, with the following lines:

{
    "IdTitulo": "Jaws",
    "IdDirector": "Steven Spielberg",
    "IdNumber": 8,
    "IdDecimal": "2.33"
}

An there is a schema in my db collection, named test_dec. This is what I've used to create the schema:

db.createCollection("test_dec",
{validator: {
    $jsonSchema: {
         bsonType: "object",
         required: ["IdTitulo","IdDirector"],
         properties: {
         IdTitulo: {
                "bsonType": "string",
                "description": "string type, nombre de la pelicula"
            },
         IdDirector: {
                "bsonType": "string",
                "description": "string type, nombre del director"
            },
        IdNumber : {
                "bsonType": "int",
                "description": "number type to test"
            },
        IdDecimal : {
                 "bsonType": "decimal",
                 "description": "decimal type"
                    }
       }
    }}
    })

I've made multiple attempts to insert the data. The problem is in the IdDecimal field value.

Some of the trials, replacing the IdDecimal line by:

 "IdDecimal": 2.33

 "IdDecimal": {"$numberDecimal": "2.33"}

 "IdDecimal": NumberDecimal("2.33")

None of them work. The second one is the formal solution provided by MongoDB manuals (mongodb-extended-json) adn the error is the output I've placed in my question: bson.errors.InvalidDocument: key'$numberDecimal' must not start with '$'.

I am currently using a python to load the json. I've been playing around with this file:

import os,sys
import re
import io
import json
from pymongo import MongoClient
from bson.raw_bson import RawBSONDocument
from bson.json_util import CANONICAL_JSON_OPTIONS,dumps,loads
import bsonjs as bs

#connection
client = MongoClient('localhost',27018,document_class=RawBSONDocument)
db     = client['myDB']
coll   = db['test_dec']   
other_col = db['free']                                                                                        

for fname in os.listdir('/mnt/win/load'):                                                                               
    num = re.findall("\d+", fname)

    if num:

       with io.open(fname, encoding="ISO-8859-1") as f:

            doc_data = loads(dumps(f,json_options=CANONICAL_JSON_OPTIONS))

            print(doc_data) 

            test = '{"idTitulo":"La pelicula","idRelease":2019}'
            raw_bson = bs.loads(test)
            load_raw = RawBSONDocument(raw_bson)

            db.other_col.insert_one(load_raw)


client.close()

I am using a json file. If I try to parse anything like Decimal128('2.33') the output is "ValueError: No JSON object could be decoded", because my json has an invalid format.

The result of

    db.other_col.insert_one(load_raw) 

Is that the content of "test" is inserted. But I cannot use doc_data with RawBSONDocument, because it goes like that. It says:

  TypeError: unpack_from() argument 1 must be string or buffer, not list:

When I manage to parse the json directly to the RawBSONDocument I got all the trash within and the record in database looks like the sample here:

   {
    "_id" : ObjectId("5eb2920a34eea737626667c2"),
    "0" : "{\n",
    "1" : "\t\"IdTitulo\": \"Gremlins\",\n",
    "2" : "\t\"IdDirector\": \"Joe Dante\",\n",
    "3" : "\t\"IdNumber\": 6,\n",
    "4" : "\"IdDate\": {\"$date\": \"2010-06-18T:00.12:00Z\"}\t\n",
    "5" : "}\n"
     }

It seems it is not that simple to load a extended json into MongoDB. The extended version is because I want to use schema validation.

Oleg pointed out that is numberDecimal and not NumberDecimal as I had it before. I've fixed the json file, but nothing changed.

Executed:

with io.open(fname, encoding="ISO-8859-1") as f:
      doc_data = json.load(f)                
      coll.insert(doc_data)

And the json file:

 {
    "IdTitulo": "Gremlins",
    "IdDirector": "Joe Dante",
    "IdNumber": 6,
    "IdDecimal": {"$numberDecimal": "3.45"}
 }
powerPixie
  • 718
  • 9
  • 20
  • Does this answer your question? [Python - Pymongo MongoDB 3.4 - NumberDecimal](https://stackoverflow.com/questions/44283527/python-pymongo-mongodb-3-4-numberdecimal) – Fraction Apr 29 '20 at 10:43
  • No. I am using a json. If I try to replace the IdDecimal with Decimal128("2.33") the error states : "ValueError: No JSON object could be decoded" – powerPixie Apr 29 '20 at 11:24

4 Answers4

3

One more roll of the dice from me. If you are using schema validation as you are, I would recommend defining a class and being explicit with defining each field and how you propose to convert the field to the relevant python datatypes. While your solution is generic, the data structure has to be rigid to match the validation.

IMO this is clearer and you have control over any errors etc within the class.

Just to confirm I ran the schema validation and this works with the supplied validation.

from pymongo import MongoClient
import bson.json_util
import dateutil.parser
import json

class Film:
    def __init__(self, file):
        data = file.read()
        loaded = json.loads(data)
        self.IdTitulo  = loaded.get('IdTitulo')
        self.IdDirector = loaded.get('IdDirector')
        self.IdDecimal = bson.json_util.Decimal128(loaded.get('IdDecimal'))
        self.IdNumber = int(loaded.get('IdNumber'))
        self.IdDateTime = dateutil.parser.parse(loaded.get('IdDateTime'))

    def insert_one(self, collection):
        collection.insert_one(self.__dict__)

client = MongoClient()
mycollection = client.mydatabase.test_dec

with open('c:/temp/1.json', 'r') as jfile:
    film = Film(jfile)
    film.insert_one(mycollection)

gives:

> db.test_dec.findOne()
{
        "_id" : ObjectId("5eba79eabf951a15d32843ae"),
        "IdTitulo" : "Jaws",
        "IdDirector" : "Steven Spielberg",
        "IdDecimal" : NumberDecimal("2.33"),
        "IdNumber" : 8,
        "IdDateTime" : ISODate("2020-05-12T10:08:21Z")
}

>

JSON file used:

{
    "IdTitulo": "Jaws",
    "IdDirector": "Steven Spielberg",
    "IdNumber": 8,
    "IdDecimal": "2.33",
    "IdDateTime": "2020-05-12T11:08:21+0100"
}
Belly Buster
  • 8,224
  • 2
  • 7
  • 20
0

JSON with type information is called Extended JSON. Following the examples, construct extended json for your data:

ext_json = '''
{
    "IdTitulo": "Jaws",
    "IdDirector": "Steven Spielberg",
    "IdNumber": 8,
    "IdDecimal": {"$numberDecimal":"2.33"}
}
'''

In Python, use json_util to load extended json into a Python dictionary:

from bson.json_util import loads

doc = loads(ext_json)

print(doc)

# {u'IdTitulo': u'Jaws', u'IdDirector': u'Steven Spielberg', u'IdDecimal': Decimal128('2.33'), u'IdNumber': 8}

The result of this load is sometimes referred to as a "BSON document" but it is not BSON, which is binary. "BSON" in this context really means that some values are not of python standard library types. The "document" part basically means the object is a dictionary.

You will notice that IdNumber is of a non-standard library type:

print type(doc['IdDecimal'])

# <class 'bson.decimal128.Decimal128'>

To insert this dictionary into MongoDB, follow pymongo tutorial:

from pymongo import MongoClient
client = MongoClient('localhost', 14420)

db = client.test_database

collection = db.test_collection

collection.insert_one(doc)

print(doc)
D. SM
  • 13,584
  • 3
  • 12
  • 21
  • It's a link to a page with generic comments. Not useful in my case. But thanks. – powerPixie May 07 '20 at 09:55
  • What is it that you are asking then? Your question has no question. – D. SM May 07 '20 at 17:01
  • I think I exposed my problem when trying to read from a json file to insert the content into a collection in MongoDB using schema validation. I've posted my scripts and output results. – powerPixie May 08 '20 at 07:35
  • the n in $numberDecimal is a lower cased one. – D. SM May 08 '20 at 16:57
  • Absolutelly right, thanks!! But the error reamains. I updated the post. – powerPixie May 09 '20 at 11:58
  • Updated the answer – D. SM May 10 '20 at 23:54
  • Well, the ext_json is just like what I did in my line of code test = '{"idTitulo":"La pelicula","idRelease":2019}'. I said it worked, but you have the file in the same script of your import command. The only difference in your approach is the use of line breaks. The problem is when you have a json in a directory and you open the file and try to parse it to insert in MongoDB. – powerPixie May 11 '20 at 05:56
  • Read the files' contents then use the code I provided which includes parsing the contents. – D. SM May 11 '20 at 15:26
0

Finally, I've got the solution and it is using RawBSONDocument.

First the json file:

{
    "IdTitulo": "Dead Snow",
    "IdDirector": "Tommy Wirkola",
    "IdNumber": 11,
    "IdDecimal": {"$numberDecimal": "2.22"}
}

& the validation schema file:

db.createCollection("test_dec",
  {validator: {
     $jsonSchema: {
        bsonType: "object",
        required: ["IdTitulo","IdDirector"],
        properties: {
            IdTitulo: {
                "bsonType": "string",
                "description": "string type, nombre de la pelicula"
                },
            IdDirector: {
                "bsonType": "string",
                "description": "string type, nombre del director"
                },
            IdNumber : {
                "bsonType": "int",
                "description": "number type to test"
               },
            IdDecimal : {
                 "bsonType": "decimal",
                 "description": "decimal type"
                }
             }
          }}
   })

So, the collection in this case is "test_dec".

And the python script that opens the file ".json", reads it and parses it to be imported into MongoDB.

import json
from bson.raw_bson import RawBSONDocument
from pymongo import MongoClient
import bsonjs

#connection
client = MongoClient('localhost',27018)
db     = client['movieDB']
coll   = db['test_dec']

#open an read file
with open('1.json', 'r') as jfile:
    data = jfile.read()

    loaded = json.loads(data)
    dumped = json.dumps(loaded, indent=4)
    bson_bytes = bsonjs.loads(dumped)

    coll.insert_one(RawBSONDocument(bson_bytes))


client.close()

The inserted document:

{
    "_id" : ObjectId("5eb971ec6fbab859dfae8a6f"),
    "IdTitulo" : "Dead Snow",
    "IdDirector" : "Toomy Wirkola",
    "IdDecimal" : NumberDecimal("2.22"),
    "IdNumber" : 11
 }

I don't know how it flipped the fields IdDecimal and IdNumber, but it passes the validation and I am really happy.

I tried a document with 'hello' instead of a number in NumberDecimal and the insertion resulted in:

 {
    "_id" : ObjectId("5eb973b76fbab859dfae8ecd"),
    "IdTitulo" : "Shining",
    "IdDirector" : "Stanley Kubrick",
    "IdDecimal" : NumberDecimal("NaN"),
    "IdNumber" : 19
  }

Thanks to all that tried to help. Specially Oleg!!! Thank you for being so patient.

powerPixie
  • 718
  • 9
  • 20
0

Could you not just use bson.decimal128.Decimal128? Ot am I missing something?

from pymongo import MongoClient
from bson.decimal128 import Decimal128

db = MongoClient()['mydatabase']

data = {
    "IdTitulo": "Jaws",
    "IdDirector": "Steven Spielberg",
    "IdNumber": 8,
    "IdDecimal": "2.33"
}

data['IdDecimal'] = Decimal128(data['IdDecimal'])
db.other_col.insert_one(data)
Belly Buster
  • 8,224
  • 2
  • 7
  • 20
  • The problem basically is to open a json file. All the examples I saw, including a solution someone posted here goes by placing a few lines in json format within the script, but they actually don't open a json file to load its contents in a MongoDB with schema validation. If wasn't for the validation, it works fine. But if you have date format or decimal in your schema, or any other extended feature the only way I found to work around is using the code I posted myself after struggling with the issue. The key is to use: read(), some dumps and loads and RawBSONDocument. – powerPixie May 12 '20 at 07:05
  • If you have a lot of these, what I would recommend is creating a class for each type, having a method to read the JSON from a file into the class and then another method to convert the relevant fields to the corresponding pymongo datatypes (e.g. datetime.datetime, bson.decimal128.Decimal128 etc.) and then a method to do the insert / upsert into mongodb. – Belly Buster May 12 '20 at 09:54
  • I know you have "solved" this yourself but I've added another answer which explains how this could work. – Belly Buster May 12 '20 at 10:36