0

I'm having a mystery here. The issue itself is worked around by now, but I still can't see the actual cause: On our image sharing site Pixabay.com we recently implemented the srcset attribute for img tags on search results. You can see that in action here: https://pixabay.com/photos/

A typical img tag in there looked like this:

<img src="/image__180.jpg" srcset="/image__180.jpg 1x, /image__340.jpg 2x" alt="...">

It worked very well - for about 99% of all users. However, a few reported to see the issue depicted in this screenshot:

enter image description here

Some 30-50 images loaded correctly on the page, while the others resulted in broken images. We realized, our NGINX log contained a few errors like this:

open() "/.../image__180.jpg" srcset="/image__180.jpg 1x, /image__340.jpg 2x" failed (2: No such file or directory)

Apparently, for an unknown reason, the client requested the whole expression (value of src+"srcset"+value of srcset) as image path, which of course resulted in an error 404.

We played around a bit and realized, first providing the srcset and then the src attribute on the img tags solves the issue. No more error logs, no more complaints.

<img srcset="/image__180.jpg 1x, /image__340.jpg 2x" src="/image__180.jpg" alt="...">

I couldn't find any reports of this behavior anywhere on the web. But I'd like to learn more. Here's the discussion on Pixabay with several users reporting the issue: https://pixabay.com/en/forum/help-me-please-11/pixabay-technical-difficulties-1474/?pagi=2

Do you have an explanation?

Simon Steinberger
  • 6,605
  • 5
  • 55
  • 97
  • My guess, it's just typos in HTML. – Alexey Ten Dec 17 '15 at 10:35
  • Are the users with this issue behind a proxy server? Or do they surf over a cellular network? Maybe a proxy or carrier between the user and the server tries to rewrite image URLs. – ausi Dec 17 '15 at 14:07
  • A typo is really ruled out here. We've tripple checked that and as main developer of Pixabay, I'm pretty sure it's not that simple. Also, it's working well for millions of other users. At least some of the concerned users don't use a cellular network, but I don't know whether a proxy is in use. My guess is also that some device in between is using a buggy parser or regex pattern or something alike. Anyways, it's an issue that occurs rarely, but here and there allover the world. – Simon Steinberger Dec 17 '15 at 14:55
  • Do all these errors appear on a HTTPS pages? Because IMO a network carrier is not able to rewrite HTML over HTTPS connections without using a proxy. – ausi Dec 17 '15 at 15:04
  • Yes, it's HTTPS only. – Simon Steinberger Dec 17 '15 at 15:47

1 Answers1

2

There is absolutely no way for a browser to screw this up normally. HTML parsers are rock-solid, they don't randomly eat extra bytes for an attribute.

This is definitely a proxy or some other MITM screwing with the markup somehow. I suggest dropping in some JS that quickly examines all the src attributes on the page and checks if any contain "srcset", and if so, logging as much information as you can about the UA or whatever, so you can try to find a commonality between them.

Suspect it's probably some weird proxy examining/rewriting source, using a regex like /image.*.jpg/ and rewriting it back URL-escaped. That'll catch everything from the start of your src image up to the final .jpg in your srcset, and escape all the spaces and quotes between them so you get a single big src attribute value.

Alternately, since this is apparently delivered over HTTPS, which reduces the chance of proxy rewriting, it may be a badly-behaved extension.

Xanthir
  • 18,065
  • 2
  • 31
  • 32
  • I've accepted your answer, because it just makes sense - and especially for the idea of logging data through JavaScript. That should help solve the case. if If find out more, I'll post it here. – Simon Steinberger Dec 17 '15 at 22:20