2

I have a question regarding documents patching in MarkLogic databse using REST API.

I have a service written in .NET Core and I use MarkLogic as my store for the data. In my case I must patch thousands of documents and if it is possible, I don't want to make a thousands of requests. To be more specific - I must add a few properties in certain part of JSON document. Following this guides:

I understand, that using PATCH request we can only update one document at a time so I tried to make a POST like this (for now only with one example patch operation)

POST http://host:port/v1/documents HTTP/1.1
Authorization: Basic autorization
Content-Type: multipart/mixed; boundary=BOUNDARY

--BOUNDARY
Content-Type: application/json
Content-Disposition: category=content; attachment; filename=/documents/first_document_to_update.json
X-HTTP-Method-Override: PATCH

{
  "patch": [
    {
      <patch property insert definition>
    }
  ]
}
--BOUNDARY--

but it just created a document with a content from BOUNDARY at uri from Content-Disposition header. I also tried to use X-HTTP-Method-Override header directly on a POST request, it didn't work out either - I got

{
  "errorResponse": {
    "statusCode": 400,
    "status": "Bad Request",
    "messageCode": "REST-REQUIREDPARAM",
    "message": "REST-REQUIREDPARAM: (err:FOER0000) Required parameter: uri"
  }
}

So my conclusion is that it is not possible to make a patch update of multiple documents using one POST request, am I right? Or I am missing something important?

MarkLogic version: 10

  • Why not make thousands of requests? If you were do do that in a multi-threaded fashion, you can get a lot more done and don't have to worry about huge transactions either timing out or blowing some limit and erroring out and can spread the load across the cluster instead of using just one node. Even if you figure out how to do multiple in one shot, I would recommend doing it mutli-threaded with many small requests, similar to a CoRB job. – Mads Hansen Sep 01 '22 at 11:37
  • 2
    I am replying here since I do not want the reply to become a solution. Any way of doing this with the APIs you suggest would be abusing some options by creating side effects. Follow guidance from Mads Hansen - multiple connections is your friend. HTTP has the notion of keep-alive - written correctly, there is no new socket open per call. CoRB2 would do what you want - the URIs passed can be multiple(and loop in the code). You could replicate CoRB2 using the EVAL endpoint. But then how to handle errors on a single update when doing a batch. So, now we are back at multiple connections... – David Ennis -CleverLlamas.com Sep 01 '22 at 14:29
  • 2
    Thank you for your replies! I knew that it is not really a bad solution, but I needed more arguments. I haven't thought earlier about multiple requests in a way you present it to me. I mean that is very true, we have few nodes, load balancer, I will not have to care about bigger timeouts etc. Implementing this in a multi-threaded fashion now sounds like a solution with a good outcome. I am going to implement this that way and post here some experience if I notice something worth to share. Thanks again! – CraneSenior Sep 02 '22 at 07:42
  • @MadsHansen I think it would be useful for you to post your thoughts as an answer, which OP can then accept. – Dave Cassel Sep 02 '22 at 13:35

1 Answers1

0

Why not make thousands of requests?

If you were do do that in a multi-threaded fashion, you can get a lot more done and don't have to worry about huge transactions either timing out or blowing some limit and erroring out. And you can spread the load across the cluster instead of using just one node to accomplish all of the work.

Even if you figure out how to do multiple in one shot, I would recommend doing it mutli-threaded with many small requests instead, similar to a CoRB job.

Mads Hansen
  • 63,927
  • 12
  • 112
  • 147