How to abort some but not all ingest flows in MarkLogic Data Hub Framework

Question

We are working with the MarkLogic Data Hub Framework and ingesting documents in a unitemporal database via the REST multi-document write documents endpoint.

Now, sometimes we receive document updates via this way of documents that didn't change. Obviously, in that case we do not want to add these documents in MarkLogic because of the unitemporal character that will result in flawed timestamps and unnecessary storage space.

We have written some code to detect duplicates (using hashing), however, we do not know how we can abort the ingestion of a duplicate document while non-duplicated documents in the same request are processed. That is, when a single request containing both non-duplicate and duplicate documents how can we prevent writing only the non-duplicates. The Data Hub Framework does not have any plugins to modify the document writing (as this is controlled by the REST api). We tried to throw an fn:error() in the content-plugin but unfortunately that aborts the whole multi-document write instead of only the writes for those document that result in an error.

score 0 · Answer 1 · edited Jul 11 '18 at 20:28

0

I'm taking something of a shot in the dark here without seeing your code but I imagine you can return an empty sequence instead of fn:error or xdmp:document-insert in cases where you detect a duplicate and that should work out just fine.

edited Jul 11 '18 at 20:28

Mads Hansen

63,927
12
112
147

answered Jul 11 '18 at 15:24

Rob S.

3,599
6
30
39

Yup, we have tried that but it doesn't work. If you transform the document to an empty sequence in the data hub framework the documents API still writes an empty document. – T. Philippi Jul 12 '18 at 06:37

score 0 · Accepted Answer · answered Jan 09 '19 at 09:14

We eventually discussed this with a MarkLogic Solution Architect and the conclusion is that this is not possible with the default v1/documents api.

What we did to resolve this was to write our own custom api as part of the v1/resources. This api just calls the data hub framework code and then writes the documents if they are not duplicates.

How to abort some but not all ingest flows in MarkLogic Data Hub Framework

2 Answers2