I just came across a scenario that I don't know how to resolve with the existing structure of my documents. As shown below I can obviously resolve this problem with some refactoring but I am curious about how this could be resolve the most efficiently possible and respecting the same structure.
Please see that this queestion is different to How to Do An Atomic Update on an EmbeddedDocument in a ListField in MongoEngine?
Let's suppose the following models:
class Scans(mongoengine.EmbeddedDocument):
peer = mongoengine.ReferenceField(Peers, required=True)
site = mongoengine.ReferenceField(Sites, required=True)
process_name = mongoengine.StringField(default=None)
documents = mongoengine.ListField(mongoengine.ReferenceField('Documents'))
is_complete = mongoengine.BooleanField(default=False)
to_start_at = mongoengine.DateTimeField()
started = mongoengine.DateTimeField()
finished = mongoengine.DateTimeField()
class ScanSettings(mongoengine.Document):
site = mongoengine.ReferenceField(Sites, required=True)
max_links = mongoengine.IntField(default=100)
max_size = mongoengine.IntField(default=1024)
mime_types = mongoengine.ListField(default=['text/html'])
is_active = mongoengine.BooleanField(default=True)
created = mongoengine.DateTimeField(default=datetime.datetime.now)
repeat = mongoengine.StringField(choices=REPEAT_PATTERN)
scans = mongoengine.EmbeddedDocumentListField(Scans)
What I would like to do is to insert a ScanSettings object if and only if all elements of the scans fields - list of Scans embedded documents - have in turn their document list unique? By unique I mean all elements within the list at database level and not the whole list - that'd be easy.
In plain English if at the time of inserting ScanSetting any element of the scans list has a instance of scans which list of documents are duplicated, then such insertion should not happen. I mean uniqueness at the database level, taking into account existing records if any.
Given that Mongo does not support uniqueness across all elements of a list within the same document I find two solutions:
Option A
I refactor my "schema" and make Scans collection inherit from Document rather than Embedded document and change the scans field of ScanSettings to be a ListField of ReferenceFields to Scans documents. Then it is easy as I just need to save the Scans first using "Updates" with operator "add_to_set" and option "upsert=True". Then once the operation has been approved, save the ScanSettings. I will need the number of scans instances to insert + 1 queries.
Option B I keep the same "schema" but somehow generates unique IDs for the Scans embedded document. Then before any insertion of Scan Settings with a non-empty scans field I'll fetch the already existing records to see if there are duplicated document's ObjectIds among the just retrieved records and the ones to be inserted. In other words I would check uniqueness via Python rather than using MogoneEngine/Mongodb. I will need 2 x number of scan instances to insert (read + update with add_set_operator) + 1 ScanSettings save
Option C Ignore Uniqueness. Given how my model will be structured I am pretty sure there will be no duplicates or if any, it will be negligible. Then deal with duplicates at reading time. For those like me coming from Relational databases this solution feels hitching.
I am a novice in Mongo so I appreciate any comments. Thanks.
PS: I am using latest MongoEngine and free Mongodb.
Thanks a lot in advance.