0

I just came across a scenario that I don't know how to resolve with the existing structure of my documents. As shown below I can obviously resolve this problem with some refactoring but I am curious about how this could be resolve the most efficiently possible and respecting the same structure.

Please see that this queestion is different to How to Do An Atomic Update on an EmbeddedDocument in a ListField in MongoEngine?

Let's suppose the following models:

class Scans(mongoengine.EmbeddedDocument):
    peer = mongoengine.ReferenceField(Peers, required=True)
    site = mongoengine.ReferenceField(Sites, required=True)
    process_name = mongoengine.StringField(default=None)
    documents = mongoengine.ListField(mongoengine.ReferenceField('Documents'))
    is_complete = mongoengine.BooleanField(default=False)  
    to_start_at = mongoengine.DateTimeField()  
    started = mongoengine.DateTimeField()  
    finished = mongoengine.DateTimeField()


class ScanSettings(mongoengine.Document):
    site = mongoengine.ReferenceField(Sites, required=True)
    max_links = mongoengine.IntField(default=100)  
    max_size = mongoengine.IntField(default=1024)  
    mime_types = mongoengine.ListField(default=['text/html'])
    is_active = mongoengine.BooleanField(default=True)  
    created = mongoengine.DateTimeField(default=datetime.datetime.now)
    repeat = mongoengine.StringField(choices=REPEAT_PATTERN)
    scans = mongoengine.EmbeddedDocumentListField(Scans)

What I would like to do is to insert a ScanSettings object if and only if all elements of the scans fields - list of Scans embedded documents - have in turn their document list unique? By unique I mean all elements within the list at database level and not the whole list - that'd be easy.

In plain English if at the time of inserting ScanSetting any element of the scans list has a instance of scans which list of documents are duplicated, then such insertion should not happen. I mean uniqueness at the database level, taking into account existing records if any.

Given that Mongo does not support uniqueness across all elements of a list within the same document I find two solutions:

Option A

I refactor my "schema" and make Scans collection inherit from Document rather than Embedded document and change the scans field of ScanSettings to be a ListField of ReferenceFields to Scans documents. Then it is easy as I just need to save the Scans first using "Updates" with operator "add_to_set" and option "upsert=True". Then once the operation has been approved, save the ScanSettings. I will need the number of scans instances to insert + 1 queries.

Option B I keep the same "schema" but somehow generates unique IDs for the Scans embedded document. Then before any insertion of Scan Settings with a non-empty scans field I'll fetch the already existing records to see if there are duplicated document's ObjectIds among the just retrieved records and the ones to be inserted. In other words I would check uniqueness via Python rather than using MogoneEngine/Mongodb. I will need 2 x number of scan instances to insert (read + update with add_set_operator) + 1 ScanSettings save

Option C Ignore Uniqueness. Given how my model will be structured I am pretty sure there will be no duplicates or if any, it will be negligible. Then deal with duplicates at reading time. For those like me coming from Relational databases this solution feels hitching.

I am a novice in Mongo so I appreciate any comments. Thanks.

PS: I am using latest MongoEngine and free Mongodb.

Thanks a lot in advance.

pgonz
  • 1
  • 1
  • 2

1 Answers1

0

I finally went for Option A so I refactor my model to:

a) Create a Mixin class that inherits from a Document class to add two methods: overriding 'save' so that it only allows saves when the list of unique documents is empty and 'save_with_uniqueness' which allows saves and/or updates when the list of documents is empty. The idea is to enforce uniqueness.

b) Refactor both Scans and ScanSettings sot that the former redefine the 'scans' field as a ListField of references to Scans and the latter so that inherits from Document rather than Embedded Document.

c) The reality is that Scans and ScanSettings are now inheriting from the Mixin class as both classes need to guarantee uniqueness for both of their attribute 'documents' and 'scans', respectively. Hence the Mixin class.

With a) and b) I can guarantee uniqueness and save first each scan instance for it to later on be added to ScanSettings.scans in the usual way.

A few points for novices like me:

  1. See that I am using inheritance. For it to work you need too add a attribute in the meta dictionary to allow inheritance as shown in the model below.
  2. In my case I wanted to have Scans and ScanSettings in different collections so I had to make them 'abstract' as shown in the meta dictionary of the Mixin class too.
  3. For the save_with_uniqueness I used upsert=True so that if a record can created if this does not exist. The idea is to use 'save_with_uniqueness' in the same way as 'save, creating or updating a document if this exist or not.
  4. I also used 'full_result' flag as I need to return ObjectId of the latest record inserted.
  5. Document._fields is a dictionary that contain the fields that compose that document. I actually wanted to create a general-purpose save_with_uniqueness method so I did not want to have to manually type in the fields of the Document or just duplicate unnecessary code - hence the Mixin.

Finally the code. It's not fully tested but enough to get the main idea right for what I need.

class UniquenessMixin(mongoengine.Document):


def save(self, *args, **kwargs):
    try:
        many_unique = kwargs['many_unique']
    except KeyError:
        pass
    else:
        attribute = getattr(self, many_unique)
        self_name = self.__class__.__name__
        if len(attribute):
            raise errors.DbModelOperationError(f"It looks like you are trying to save a {self.__class__.__name__} "
                                               f"object with a non-empty list of {many_unique}. "
                                               f"Please use '{self_name.lower()}.save_with_uniqueness()' instead")
    return super().save(*args, **kwargs)

def save_with_uniqueness(self, many_unique):
    attribute = getattr(self, many_unique)
    self_name = self.__class__.__name__
    if not len(attribute):
        raise errors.DbModelOperationError(f"It looks like you are trying to save a {self_name} object with an "
                                           f"empty list {many_unique}. Please use '{self_name.lower()}.save()' "
                                           f"instead")

    updates, removals = self._delta()
    if not updates:
        raise errors.DbModelOperationError(f"It looks like you are trying to update '{self.__class__.__name__}' "
                                           f"but no fields were modified since this object was created")

    kwargs = {(key if key != many_unique else 'add_to_set__' + key): value for key, value in updates.items()}
    pk = bson.ObjectId() if not self.id else self.id
    result = self.__class__.objects(id=pk).update(upsert=True, full_result=True, **kwargs)

    try:
        self.id = result['upserted']
    except KeyError:
        pass
    finally:
        return self.id

meta = {'allow_inheritance': True, 'abstract': True}

class Scans(UniquenessMixin):

    peer = mongoengine.ReferenceField(Peers, required=True)
    site = mongoengine.ReferenceField(Sites, required=True)
    process_name = mongoengine.StringField(default=None)
    documents = mongoengine.ListField(mongoengine.ReferenceField('Documents'))
    is_complete = mongoengine.BooleanField(default=False)  
    to_start_at = mongoengine.DateTimeField()  
    started = mongoengine.DateTimeField()  
    finished = mongoengine.DateTimeField()

    meta = {'collection': 'Scans'}


class ScanSettings(UniquenessMixin):

       site = mongoengine.ReferenceField(Sites, required=True)
    max_links = mongoengine.IntField(default=100)  
    max_size = mongoengine.IntField(default=1024)  
    mime_types = mongoengine.ListField(default=['text/html'])
    is_active = mongoengine.BooleanField(default=True)  
    created = mongoengine.DateTimeField(default=datetime.datetime.now)
    repeat = mongoengine.StringField(choices=REPEAT_PATTERN)
    scans = mongoengine.ListField(mongoengine.ReferenceField(Scans))

    meta = {'collection': 'ScanSettings'}
pgonz
  • 1
  • 1
  • 2