2

I'm trying to create a good document structure that is having good reading performances and not so slow writing performances. I need to store informations about a UserConnection document, that represents the link between two users in my database. Each link is having a weight that depends on a list of parameters.

The following is a schema that represents the data, the links, and weights components:

UserConnection      Services          Components
  UserA             Name              Name
  UserB             Weight            Weight
  Weight            Components
  Services

UserConnection.Weight = sum(UserConnection.Services.Weight)
UserConnection.Services.Weight = sum(UserConnection.Services.Weight.Components)

I'm currently using Mongoengine into Django. I tried to define some structure using EmbeddedDocument and EmbeddedDocumentListField, but I'm not happy about it.

I'm not adding my tests for brevity sake.

Can anyone please help me to find a good structure for MongDB?

Thank you.

1 Answers1

2

Because mongoengine defines document structure in python classes you can use normal python class @property decorated methods to return calculated values.

Consider the following example:

import mongoengine as mdb

mdb.connect("so-37396173")


class User(mdb.Document):
    name = mdb.StringField()


class UserConnection(mdb.Document):
    user_a = mdb.ReferenceField('User')
    user_b = mdb.ReferenceField('User')
    services = mdb.ListField(mdb.ReferenceField('Service'))

    @property
    def weight(self):
        return sum([s.weight for s in self.services])


class Component(mdb.EmbeddedDocument):
    name = mdb.StringField()
    weight = mdb.FloatField()


class Service(mdb.Document):
    name = mdb.StringField()
    components = mdb.EmbeddedDocumentListField('Component')

    @property
    def weight(self):
        return sum([c.weight for c in self.components])

When you have a UserConnection object you can then access a weight attribute:

>>> uc = UserConnection.objects.first()
>>> uc.weight
0.8544546532

The drawback of this is that weight is never stored on the database in the context of a UserConnection so you can't aggregate it or sort on it at that level but the aggregation framework may provide some good options. If you need to have the weight saved then you could define some signals to include it before the document is saved:

import mongoengine as mdb

mdb.connect("so-37396173")

class User(mdb.Document):
    name = mdb.StringField()


class UserConnection(mdb.Document):
    user_a = mdb.ReferenceField('User')
    user_b = mdb.ReferenceField('User')
    services = mdb.ListField(mdb.ReferenceField('Service'))
    weight = mdb.FloatField()


    @classmethod
    def calc_weight(cls, sender, document, **kwargs):
        document.weight =  sum([s.weight for s in document.services])

mdb.signals.pre_save.connect(UserConnection.calc_weight, sender=UserConnection)

class Component(mdb.EmbeddedDocument):
    name = mdb.StringField()
    weight = mdb.FloatField()


class Service(mdb.Document):
    name = mdb.StringField()
    components = mdb.EmbeddedDocumentListField('Component')
    weight = mdb.FloatField()

    @classmethod
    def calc_weight(cls, sender, document, **kwargs):
        document.weight =  sum([s.weight for s in document.components])

mdb.signals.pre_save.connect(Service.calc_weight, sender=Service)

A drawback with this is that you have to call the save method which means doing update with an upsert=True won't create the weights.

Whether you use embedded vs. reference depends on what else you want to do with the data.

Community
  • 1
  • 1
Steve Rossiter
  • 2,624
  • 21
  • 29
  • Thank you Steve for your answer. I have other question about it: 1. why you used ReferenceField and not EmbeddedDocument for Service collection? 2. I need to store the sum and not calculate it on the fly every time I access to the data. May be the field weight into Service can be filled using a pre_save signal to do the sum only when a component is added/modified/removed. What do you think about it? – Stefano Falsetto May 24 '16 at 15:09
  • Because EmbeddedDocuments are not stored in their own collection but are stored in the parent document collection. Having documents in a collection can make some querying and indexing a bit easier if there are many items. I agree about the signals as the right choice, I'll edit the answer to show that approach as well. – Steve Rossiter May 24 '16 at 16:10
  • Thank you again Steve! It would be nice to have a signal that can be used in update with upsert too. – Stefano Falsetto May 25 '16 at 08:52