0

I am using Monary

I have a database in Mongo with a lot of rows and columns. Currently, the way I am using monary is as follows:

client = Monary()
data = client.query("static_database",             # Database name
                     "properties",                 # Collection name 
                      {},                          # Query
                     ["Name","Address1","Address2"], # Field/col names
                     ["string:72"]*3)              # The types of each 
                                                   # field/col

The collection properties has a lot of field names and I want to take most if not all of those fields into data. Typing more than 10 fields into a list seems like a pain.

I also want to use different collections in the future, so a way of getting all column/field names would help a lot. I read through the docs and FAQs, yet haven't been able to find a solution.

KugelBlitz
  • 167
  • 1
  • 9

1 Answers1

0

I ended up using pyMongo to get all respective column names, using the code provided by this answer and mapReduce documentation.

The next problem was type. It was pretty annoying to try and find the type of each field (each field could have items of multiple types). I looked at this question. But it seemed like a mess, so I ended up just taking a random entry from mongodb that hopefully contained all the fields.

There should be a better way of getting type names, but I did not implement that. The final messy and bad implementation is as follows:

def getColsfromdb(supplier_name):
    client = MongoClient()
    db = client['supplier_static_database']
    map = Code("function(){   for (var key in this) {emit(key,null);} }") 
    reduce = Code("function(key, stuff) {return null;}")
    pT = getattr(db, supplier_name)
    mR = pT.map_reduce(map,reduce,supplier_name + "_keys")

    types_ = [type(v).__name__ for k,v in sorted(pT.find().limit(-1).skip(100).next().items())]

    cols4db = []
    for doc in mR.find():
        cols4db.append(doc["_id"])
    cols4db = sorted(cols4db)

    for i,t in enumerate(types_):
        if "unicode" in t:
            types_[i] = "string:50"
        if "ObjectId" in t:
            types_[i] = "id"
        if "list" in t or "NoneType" in t:
            types_[i] = "string:50"
        if "int" in t:
            types_[i]="int64"
        if "float" in t:
            types_[i]="float64"

    try:
        assert len(types_)==len(cols4db)
    except:
        tmpdiff = len(types_)-len(cols4db)
        if tmpdiff<0:
            for i in range(abs(tmpdiff)):
                types_.append("string:10")
        else:
            del cols4db[-tmpdiff:]

    return cols4db,types_

Initially, I had the except generate a random number every time it failed, but decided to stick with "randomly chosen" 100.

KugelBlitz
  • 167
  • 1
  • 9