We are working with the MarkLogic Data Hub Framework and ingesting documents in a unitemporal database via the REST multi-document write documents endpoint.
Now, sometimes we receive document updates via this way of documents that didn't change. Obviously, in that case we do not want to add these documents in MarkLogic because of the unitemporal character that will result in flawed timestamps and unnecessary storage space.
We have written some code to detect duplicates (using hashing), however, we do not know how we can abort the ingestion of a duplicate document while non-duplicated documents in the same request are processed. That is, when a single request containing both non-duplicate and duplicate documents how can we prevent writing only the non-duplicates. The Data Hub Framework does not have any plugins to modify the document writing (as this is controlled by the REST api). We tried to throw an fn:error()
in the content-plugin but unfortunately that aborts the whole multi-document write instead of only the writes for those document that result in an error.