I noticed a couple (ostensibly-)harmless log entries, and--I'm admittedly overthinking this by a mile--got curious about Apache2 response sizes.
This Ukranian crawler † hit my web daemon, two seconds later requesting a duplicate. Apache2 replied with 41,298 bytes and then 41,244.
My question is:
Why are the response sizes different--by only 54 bytes--for the same URL?
I have not customized Apache's default cache declarations. If something were cached, I'd expect a difference near 100% of the requested content (or at least more than 0.01%).
All I can think of is a small file--a tiny GIF or .css file?--that inexplicably is the only component that's cached, but a search for files that size produced no results:
find . -type f -size -55c -size +53c
...searching 53 or 55 finds small .GIF files, and widening by several bytes yields a lot more. Expanding this assumption guess, the "missing" response data may be a file and its respective path--but that seems counter to how I thought caching works.
What am I seeing here?
ANCILLARY
They're the only two entries:
# grep -r 46.119.77.28 /var/log
/var/log/apache2/example.com-access.log:46.119.77.28 - - [26/Apr/2020:19:56:20 -0600] "GET / HTTP/1.0" 200 41298 "http://www.example.com/" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.87 Safari/537.36 OPR/54.0.2952.64 (Edition Yx)"
/var/log/apache2/example.com-access.log:46.119.77.28 - - [26/Apr/2020:19:56:22 -0600] "GET / HTTP/1.0" 200 41244 "http://www.example.com/" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.87 Safari/537.36 OPR/54.0.2952.64 (Edition Yx)"
† Around nine hours earlier it hit one of my other servers, and shortly afterward yet another. While not actively seeking vulns, it's clearly crawling the web, so I've blocked it out of principle--no immediate or planned need to be indexed outside the US.