4

Nginx etag source

etag->value.len = ngx_sprintf(etag->value.data, "\"%xT-%xO\"",
                              r->headers_out.last_modified_time,
                              r->headers_out.content_length_n)
                  - etag->value.data;

r->headers_out.etag = etag;

If the file last-modified-time in server is changed but the file content has not been updated, does the etag value will be the same?

Why not the etag value generated by content hash?

blackraven
  • 5,284
  • 7
  • 19
  • 45
junlin
  • 1,835
  • 2
  • 25
  • 38

1 Answers1

5

Why not the etag value generated by content hash?

Unless nginx has documented the reason it's hard to say why.

My speculation is that they did it this way because it's very fast and only takes a constant amount of time. Computing a hash can be a costly operation, with the amount of time needed depending on the size of the response. nginx, with a reputation for simplicity and speed, may not have been wiling to add that overhead.

If the file last-modified-time in server is changed but the file content has not been updated, does the etag value will be the same?

No, it will not be the same and therefore the file will have to be re-served. The result is a slower response than you would get with a hash-based ETag, but the response will be correct.

The bigger concern with this algorithm is that the content could change while the ETag stays the same, in which case the response will be incorrect. This could happen if the file changes (in a way that keeps the same length) faster than the one-second precision of the Last-Modified time. (In theory a hash-based approach has the same issue—that is, it's possible for two different files to produce the same hash—but collisions are so unlikely that it's not a concern in practice.)

So presumably nginx weighed this tradeoff—a faster response, but one that has a slight chance of being incorrect—and decided that it was worth it.

Kevin Christopher Henry
  • 46,175
  • 7
  • 116
  • 102
  • What's your meaning "This could happen if the file changes (in a way that keeps the same length) faster than the precision of the last modified header"? In my opinion, if the file changes in same content length, then the **timestamp** is must be different from **last-modified-time**. – junlin Apr 26 '19 at 14:08
  • 1
    @junlin: The *precision* of the `Last-Modified` header is one second. So if there are multiple changes to a file in less than a second, they will all have the same `Last-Modified` header. If they also have the same `Content-Length`, their nginx `ETag` will be the same, even though the content is different. – Kevin Christopher Henry Apr 26 '19 at 14:42
  • Oh I see, the **last-modified-time** is not **timestamp**, just GMT timezone (e.g. `Sat, 20 Apr 2019 06:39:29 GMT`) not including millisecond. Thanks! – junlin Apr 26 '19 at 14:58
  • 1
    We have a problem with gluster-backed volumes whereby a large file is still being written to disk, but the meta about size and last modified time does not change, even as the file is being written and finalized. NGINX picks up the "not quite written file" and the associated entropy (zero-stuffing) , and the mangled file is now cached downstream with an Etag that won't change, even though the contents of the file do. In such a case, the nginx compute method of ETag with Size and Last Modified would not be sufficient to invalidate "poison cache" items that were cached during this time. – Patrick Scott Best May 17 '21 at 18:15