0

I have kind of a dual question which is keeping me from proceeding for a while already. I have read lots of articles, checked stackoverflow numerous times and read again the docs of mongoengine but I can not find the answer which works for me. I am using mongoDB to store the data of a Flask webb-app. To query the DB, I am using mongoengine. Now suppose my users model lookes like this:

Users

name: Superman
kudos:
       0 0 date
         1 category A
       1 0 date
         1 category B

name: Superman
kudos:
       0 0 date
         1 category A
       1 0 date
         1 category A
       2 0 date
         1 category B

The kudo's are nested documents which get created whenever a user receives a kudo. I store them as a db.ListField(date=now). This is working perfectly fine.

In a relational DB I would have a seperate kudo scheme. In mongoDB I assumend it would be the better solution to create nested documents wihtin the User collections. Otherwise you are still creating all kind of seperate scheme's which relations to others.

So here are my two main questions:

  1. Am I correct in that my architecture is true to how mongoengine should be implemented?
  2. How can I get a list (dict actually) of kudo's per category? So I would like to query and get Category - Count

Result should be: kudos=[(category A, 3),(category B, 2)

If I already had something even remotely working I would provide it but I am completely stuck. Thats why I even started doubting storing the kudos in a seperate collection but I feel like I am than starting to get off track in correctly using a noSQL DB.

1 Answers1

0

Assuming you have the following schema and data:

import datetime as dt
from mongoengine import *

connect(host='mongodb://localhost:27017/testdb')


class  Kudo(EmbeddedDocument):
    date = DateTimeField(default=dt.datetime.utcnow)
    category = StringField()

class User(Document):
    name = StringField(required=True)
    kudos = EmbeddedDocumentListField(Kudo)


superman = User(name='superman', kudos=[Kudo(category='A')]).save()
batman = User(name='batman', kudos = [Kudo(category='A'), Kudo(category='B')]).save()

This isn't the most efficient but you can get the distribution with the following simple snippet:

import itertools
from collection import Counter

raw_kudos = User.objects.scalar('kudos')
categories_counter = Counter(k.category for k in itertools.chain.from_iterable(raw_kudos))    # raw_kudos is a list of list
print(categories_counter)    # is a dict --> Counter({u'A': 1, u'B': 1})

And if you need higher performance, you'll need to use an aggregation pipeline

bagerard
  • 5,681
  • 3
  • 24
  • 48