HTTP ReST: update large collections: better approach than JSON PATCH?

Question

I am designing a web service to regularly receive updates to lists. At this point, a list can still be modeled as a single entity (/lists/myList) or an actual collection with many resources (/lists/myList/entries/<ID>). The lists are large (millions of entries) and the updates are small (often less than 10 changes).

The client will get web service URLs and lists to distribute, e.g.:

http://hostA/service/lists: list1, list2
http://hostB/service/lists: list2, list3
http://hostC/service/lists: list1, list3

It will then push lists and updates as configured. It is likely but undetermined if there is some database behind the web service URLs.

I have been researching and it seems a HTTP PATCH using the JSON patch format is the best approach.

Context and examples: Each list has an identifying name, a priority and millions of entries. Each entry has an ID (determined by the client) and several optional attributes. Example to create a list "requiredItems" with priority 1 and two list entries:

PUT /lists/requiredItems
Content-Type: application/json


{
  "priority": 1,
  "entries": {
    "1": {
      "color": "red",
      "validUntil": "2016-06-29T08:45:00Z"
    },
    "2": {
      "country": "US"
    }
  }
}

For updates, the client would first need to know what the list looks like now on the server. For this I would add a property "revision" to the list entity.

Then, I would query this attribute:

GET /lists/requiredItems?property=revision

Then the client would see what needs to change between the revision on the server and the latest revision known by the client and compose a JSON patch. Example:

PATCH /list/requiredItems
Content-Type: application/json-patch+json

[
  { "op": "test", "path": "revision", "value": 3 },
  { "op": "add", "path": "entries/3", "value": { "color": "blue" } },
  { "op": "remove", "path": "entries/1" },
  { "op": "remove", "path": "entries/2/country" },
  { "op": "add", "path": "entries/2/color", "value": "green" },
  { "op": "replace", "path": "revision", "value": 10 }
]

Questions:

This approach has the drawback of slightly less client support due to the not-often-used HTTP verb PATCH. Is there a more compatible approach without sacrificing HTTP compatibility (idempotency et cetera)?
Modelling the individual list entries as separate resources and using PUT and DELETE (perhaps with ETag and/or If-Match) seems an option (PUT /lists/requiredItems/entries/3, DELETE /lists/requiredItems/entries/1 PUT /lists/requiredItems/revision), but how would I make sure all those operations are applied when the network drops in the middle of an update chain? Is a HTTP PATCH allowed to work on multiple resources?
Is there a better way to 'version' the lists, perhaps implicitly also improving how they are updated? Note that the client determines the revision number.
Is it correct to query the revision number with GET /lists/requiredItems?property=revision? Should it be a separate resource like /lists/requiredItems/revision? If it should be a separate resource, how would I update it atomically (i.e. the list and revision are both updated or both not updated)?
Would it work in JSON patch to first test the revision value to be 3 and then update it to 10 in the same patch?

Please don't ask more than one question at a time. All of them are valid but you probably won't get an answer to all of them. — , Jun 29 '16 at 08:02
@LutzHorn: thanks. You are right. Should I remove all but one question and repost the rest separately? The introduction part would be the same... — Tomas Creemers, Jun 29 '16 at 08:27

score 2 · Answer 1 · edited Oct 07 '21 at 11:02

2

This approach has the drawback of slightly less client support due to the not-often-used HTTP verb PATCH.

As far as I can tell, PATCH is really only appropriate if your server is acting like a dumb document store, where the action is literally "please update your copy of the document according to the following description".

So if your resource really just is a JSON document that describes a list with millions of entries, then JSON-Patch is a great answer.

But if you are expecting that the patch will, as a side effect, update an entity in your domain, then I'm suspicious.

Is a HTTP PATCH allowed to work on multiple resources?

RFC 5789

The PATCH method affects the resource identified by the Request-URI, and it also MAY have side effects on other resources

I'm not keen on querying the revision number; it doesn't seem to have any clear advantage over using an ETag/If-Match approach. Some obvious disadvantages - the caches between you and the client don't know that the list and the version number are related; a cache will happily tell a client that version 12 of the list is version 7, or vice versa.

edited Oct 07 '21 at 11:02

Community

1
1

answered Jun 29 '16 at 15:23

VoiceOfUnreason

52,766
5
49
91

Thanks! The web service may or may not have a database behind it; I'm designing the client and the web service but not the server side. Server side will be several different third party implementations. I've added some clarification at the beginning of the question. – Tomas Creemers Jun 29 '16 at 17:59
I don't understand how HTTP hopes to enable caching if a PATCH to /service/list1 is allowed to have an influence on any other resource. Just invalidate the entire cache each time a PATCH is requested? – Tomas Creemers Jun 29 '16 at 18:01
In fairness to PATCH, the same problems occur with POST, PUT, DELETE. Cache coherency is one of the two hard problems. – VoiceOfUnreason Jun 29 '16 at 18:21
JSON-Patch is the worst idea ever. It's amazing how it got through so many hearts and minds. – fiatjaf Apr 08 '17 at 14:44

score 1 · Answer 2 · edited Oct 07 '21 at 11:32

Answering my own question. My first bullet point may be opinion-based and, as has been pointed out, I've asked many questions in one post. Nevertheless, here's a summary of what was answered by others (VoiceOfUnreason) and my own additional research:

ETags are HTTP's resource 'hashes'. They can be combined with If-Match headers to have a versioning system. However, ETag-headers are normally not used to declare the ETag of a resource that is being created (PUT) or updated (POST/PATCH). The server storing the resource usually determines the ETag. I've not found anything explicitly forbidding this, but many implementations may assume that the server determines the ETag and get confused when it is provided with PUT or PATCH.

A separate revision resource is a valid alternative to ETags for versioning. This resource must be updated at the same time as the resource it is the revision of.

It is not semantically enforceable on a HTTP level to have commit/rollback transactions, unless by modelling the transaction itself as a ReST resource, which would make things much more complicated.

However, some properties of PATCH allow it to be used for this:

A HTTP PATCH must be atomic and can operate on multiple resources. RFC 5789:
- The server MUST apply the entire set of changes atomically and never provide (e.g., in response to a GET during this operation) a partially modified representation. If the entire patch document cannot be successfully applied, then the server MUST NOT apply any of the changes.
- The PATCH method affects the resource identified by the Request-URI, and it also MAY have side effects on other resources; i.e., new resources may be created, or existing ones modified, by the application of a PATCH. PATCH is neither safe nor idempotent
JSON PATCH can consist of multiple operations on multiple resources and all must be applied or none must be applied, making it an implicit transaction. RFC 6902:
Operations are applied sequentially in the order they appear in the array.

Thus, the revision can be modeled as a separate resource and still be updated at the same time. Querying the current revision is a simple GET. Committing a transaction is a single PATCH request containing first a test of the revision, then the operations on the resource(s) and finally the operation to update the revision resource.

The server can still choose to publish the revision as ETag of the main resource.

HTTP ReST: update large collections: better approach than JSON PATCH?

2 Answers2