0

I have a web application that serves both HTML and multiple RDF formats (in the example below, it's RDF/XML). A page loads as HTML (naturally), and then requests its own URL as RDF/XML.

The problem: it looks like Firefox 74.0 (64-bit) (on Windows) is mixing up ETag values from those two requests, ignoring different Content-Types as well as Vary: Accept being present.

When I reload the page, I can see it uses the ETag: "95e11fbc9e816b56" from the second (RDF/XML) response in the request for HTML, and vice versa:

Request URL: https://localhost:4443/6a6283d2-2a40-4882-b89d-8073a7c30e17/

Host: localhost:4443
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:74.0) Gecko/20100101 Firefox/74.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8
Accept-Language: en-US,en;q=0.5
Accept-Encoding: gzip, deflate, br
Referer: https://localhost:4443/6a6283d2-2a40-4882-b89d-8073a7c30e17/
Connection: keep-alive
Cookie: _ga=GA1.1.828629977.1584086266; LinkedDataHub.first-time-message=true
Upgrade-Insecure-Requests: 1
If-None-Match: "95e11fbc9e816b56"
Cache-Control: max-age=0

HTTP/1.1 200 OK
Server: Apache-Coyote/1.1
Strict-Transport-Security: max-age=31536000;includeSubDomains
ETag: "95e11fbc139f56de"
Cache-Control: max-age=3600, public
Last-Modified: Wed, 12 Feb 2020 23:05:15 GMT
Vary: Accept-Charset,Accept,Accept-Encoding
Content-Type: text/html;charset=UTF-8
Transfer-Encoding: chunked
Content-Encoding: gzip
Date: Sun, 22 Mar 2020 10:13:43 GMT
Request URL: https://localhost:4443/6a6283d2-2a40-4882-b89d-8073a7c30e17/

Host: localhost:4443
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:74.0) Gecko/20100101 Firefox/74.0
Accept: application/rdf+xml
Accept-Language: en-US,en;q=0.5
Accept-Encoding: gzip, deflate, br
Referer: https://localhost:4443/d376ee88-ff7d-48ee-81c4-1220c9f482f0/
Connection: keep-alive
Cookie: _ga=GA1.1.828629977.1584086266; LinkedDataHub.first-time-message=true
If-None-Match: "95e11fbc139f56de"
Cache-Control: max-age=0

HTTP/1.1 200 OK
Server: Apache-Coyote/1.1
Strict-Transport-Security: max-age=31536000;includeSubDomains
ETag: "95e11fbc9e816b56"
Last-Modified: Wed, 12 Feb 2020 23:05:15 GMT
Vary: Accept-Charset,Accept
Content-Type: application/rdf+xml;charset=UTF-8
Transfer-Encoding: chunked
Date: Sun, 22 Mar 2020 10:13:55 GMT

On Chrome, I cannot get it to send If-None-Match headers at all, but this is probably due to the self-signed certificate.

Note that the ETag values are similar, but different: "95e11fbc139f56de" vs. "95e11fbc9e816b56".

This doesn't make any sense to me. Any explanations? Thanks.

The relevant specification is Hypertext Transfer Protocol (HTTP/1.1): Conditional Requests.

Community
  • 1
  • 1

1 Answers1

1

The problem, essentially, is that you're relying on behavior that isn't mandated by the HTTP standard, and doesn't happen to be implemented by browsers.

For your scheme to work, browsers would have to store multiple representations of a single resource in their cache. Unfortunately, as discussed in articles like these, they don't do that.

Browsers typically do not implement the capability to store multiple variations per URL. The rationale for this is that the things we typically use Vary for (mainly Accept-Encoding and Accept-Language) do not change frequently within the context of a single user.

So the issue isn't the ETags, it's that the browser is just overwriting the single representation in its cache each time it gets a different representation.

If the browser did store multiple representations, the scheme should work fine. In that case, note that it would be the server, not the client, that selects between multiple ETags. The client would send an If-None-Match header with all the ETags it knows about, and it would be up to the server to decide which one, if any, matched the requested representation.

According to the article above, edge servers (as opposed to browsers) do keep multiple representations in the cache for each resource, so it's still possible that your scheme could generate performance gains.

Kevin Christopher Henry
  • 46,175
  • 7
  • 116
  • 102
  • Kevin, thanks a lot for your answer. Google does not offer many hits on this. I want the server to be able to respond with `304` to conditional requests for each of the media types: HTML, RDF/XML and so on. They are generated from the same DB result (RDF graph), but are obviously not interchangeable. I thought providing different tags (content hash + content type hash) for each type was enough, because the browser would send content-type-specific `If-None-Match`. But that is apparently not the case. I'm still confused -- is there a way to achieve what I want? Or is my thinking flawed. – Martynas Jusevičius Mar 22 '20 at 18:07
  • It makes sense when you say it :) But what I'm experiencing debugging, is that the browser sends RDF representation's `ETtag` when requesting HTML, and conversely HTML representation's `ETag` when requesting RDF. So they never match on the server, and I always get `200` and not `304`. That's what the request/response examples are supposed to show. Looks it it's just using the latest `ETag` regardless of content type. I'm probably missing something trivial here. – Martynas Jusevičius Mar 22 '20 at 23:20
  • If I Edit and Resend the requests and swap the `ETags`, I get `304`... – Martynas Jusevičius Mar 22 '20 at 23:29
  • Thanks again. The answer to [this question](https://stackoverflow.com/questions/1975416/what-is-the-function-of-the-vary-accept-http-header) links to a [bug in Chrome](https://bugs.chromium.org/p/chromium/issues/detail?id=94369) which is exactly about this issue. This is a bummer though :/ Looks like broken behavior to me. – Martynas Jusevičius Mar 23 '20 at 09:03
  • [The browser cache is Vary broken](https://jakearchibald.com/2014/browser-cache-vary-broken/) – Martynas Jusevičius Mar 23 '20 at 09:31
  • @MartynasJusevičius: That article does a good job of explaining the situation, I'll add it to the answer. The bug report for Chrome is not the same issue, though. That's specific to the way History is implemented, and involves seeing incorrect data. What we're talking about here is just not getting as much caching as you might like. That's not a bug, since browsers are free to choose how much data to cache (or to not cache at all). That will never result in seeing incorrect data. – Kevin Christopher Henry Mar 23 '20 at 18:29
  • 1
    "The problem, essentially, is that you're relying on behavior that isn't mandated by the HTTP standard" - well, actually it is. – Julian Reschke Mar 24 '20 at 16:19
  • @JulianReschke: Where in the standard are clients required to store a separate response for every combination of `Vary`? That would be a strange requirement when "caching is an entirely OPTIONAL feature of HTTP". ([RFC 7234](https://tools.ietf.org/html/rfc7234#section-2)) – Kevin Christopher Henry Mar 24 '20 at 16:38
  • 1
    @KevinChristopherHenry - they do not need to cache, but if they do, they need to do it properly. – Julian Reschke Mar 25 '20 at 09:04
  • @JulianReschke: They are. As described in the articles I linked to, the browsers are taking `Vary` into account in checking the cached response for a match, (If they weren't, the OP's second request would have resulted in an inappropriate cache hit.) The only issue here is that the the OP was hoping to take advantage of caching and revalidating multiple representations of a given resource when in fact the browsers will only store one at a time. – Kevin Christopher Henry Mar 25 '20 at 10:50
  • @JulianReschke can we get the exact reference where this is mentioned in the current HTTP spec? – Martynas Jusevičius Mar 28 '20 at 09:25
  • @MartynasJusevičius caching is optional, but if you do cache, you need to provide correct answers. That doesn't need be specified separately. (and yes, as Kevin said, that doesn't necessarily mean that everything is cached the same way; it's just that the end-to-end behavior needs to conform to the spec) – Julian Reschke Mar 29 '20 at 10:51
  • @JulianReschke I understand, but why not specify this explicity? E.g. `Cache = (Vary ✕ Request-URI)*` or something like that – Martynas Jusevičius Apr 07 '20 at 12:58