0

Everywhere you look, you see people recommending the use of the etag header in conjunction with if-match to do resource versioning in a REST API.

From reading the RFCs though, this is actually wrong.

Etags are particular to the representation, meaning that the XML and JSON versions of the exact same data would have different etags. More importantly, the gzipped version would have a different etag.

In addition to this, the etag is meant to be generated from the actual bytes that are transmitted, so using the database version field for it is actually wrong per the RFC.

Essentially it's because etags are designed for caching purposes, not for concurrent updates.

So, given that, what would be the "correct" way to handle concurrent updates in a REST API that is correct to the RFCs?

Graham
  • 4,095
  • 4
  • 29
  • 37

1 Answers1

0

The spec you want to review is RFC 7234, which describes how conditional requests are handled in HTTP 1.1 (as at June 2014).

In particular, the specification describes strong and weak validators.

Strong validators are usable for all conditional requests, including cache validation, partial content ranges, and "lost update" avoidance. Weak validators are only usable when the client does not require exact equality with previously obtained representation data, such as when validating a cache entry or limiting a web traversal to recent changes.

Entity-tags come in both strong and weak flavors.

An entity-tag is an opaque validator for differentiating between multiple representations of the same resource, regardless of whether those multiple representations are due to resource state changes over time, content negotiation resulting in multiple representations being valid at the same time, or both.

The spec further defines the strong comparison that MUST be used when comparing entity-tags for If-Match, which is a big hint that strong entity-tags are the "correct" way to handle conditional requests (which include state changing requests).

In addition to this, the etag is meant to be generated from the actual bytes that are transmitted, so using the database version field for it is actually wrong per the RFC.

I don't find any evidence supporting this claim in RFC 7234; the examples in section 2.3.3 suggest that the claim is false.

It gives to representations of the exact same resource - one with GZip and one without - that each have different ETags.

That's right - because the representations differ, the ETags must also differ.

But it doesn't follow that you need to use the actual bytes to generate the entity-tag. Any mapping of tags that is one-to-one with representations is allowed.

So you could have {database.version}.json, {database.version}.xml, {database.version}.xml.gz -- these are all tags that identify different representations of the same "version" of the resource.

This is detailed in the section 2.3.1: Generation

For example, a resource that has implementation-specific versioning applied to all changes might use an internal revision number, perhaps combined with a variance identifier for content negotiation, to accurately differentiate between representations.

Example: the underlying domain model is on verion 10. I'm working in JSON, so I get a copy of the resource and start manipulating my copy

GET /X

200 OK
ETag: 10.json

Meanwhile, you are also interested in making a change, but because of different reasons, you are working with a different representation

GET /X

200 OK
ETag: 10.xml.gz

Now we both try to publish our changes to the server, introducing a data race between two requests

PUT /X
If-Match 10.json

PUT /X
If-Match 10.xml.gz

The server gets to choose how to support these requests; most likely either by serializing the handlers or by using some optimistic concurrency mechanism. The server can see the entity-tags are present, and extracts the data that it needs from the opaque tags. The winner of the race gets a success message (possibly with a new entity-tag that maps to the new representation), the loser of the race gets a response with 412 Precondition Failed semantics.

Community
  • 1
  • 1
VoiceOfUnreason
  • 52,766
  • 5
  • 49
  • 91
  • So, my reading of RFC-7234#2.3.3 exactly backs up my claim. It gives to representations of the exact same resource - one with GZip and one without - that each have different ETags. In particular it then says "Content codings are a property of the representation data, so a strong entity-tag for a content-encoded representation has to be distinct from the entity tag of an unencoded representation to prevent potential conflicts during cache updates and range requests" – Graham Feb 23 '18 at 07:38
  • @Graham Yes, your reading is basically correct. However you can still support optimistic locking, because although the etag is opaque for clients, it is not for the server. As VoiceOfUnreason says, the server can make the connection between two etags (that it refers to the same "entity", regardless of representation), because it can parse back the etag it generated. – Robert Bräutigam Dec 09 '18 at 20:15