0

I am trying to wget a list of urls (images) some of them no longer exist and the host redirect to a generic "this image doesn't exist" page, of which I know the url of. I would like to wget the file unless it 302's to this domain, is it possible.

I can stop it getting the file if any redirects with the --max-redirect=0 flag, but this may stop getting real images if I hit a mirror

Aly
  • 133
  • 8
  • 1
    Can you clarify what that has to do with https or htaccess? – Shane Madden Jan 07 '15 at 19:13
  • @ShaneMadden oh sorry, it seems the title is from an old half edited post and was put in error, I will change it – Aly Jan 07 '15 at 23:15
  • Ahh, ok. Unfortunately, I can't find a way to do this (`--exclude-domains` apparently doesn't work when it's a `302`) - are all the images from a specific host, is the mirror redirect problem likely to happen? – Shane Madden Jan 07 '15 at 23:33
  • @ShaneMadden a lot are from flickr which redirects any non-existent link to this particular "image not found" image – Aly Jan 08 '15 at 14:26

1 Answers1

1

The only (really hacky) way I can imagine to accomplish this is to implement an HTTP proxy in front of wget, which can override the "image not found" with an error code so that you're not downloading it.

Any configurable proxy should be able to get this kind of behavior - for example, with Apache you could do something like:

ProxyRequests On
<Proxy http://example.com/path/to/image-not-found.jpg>
    Order allow,deny
    Deny from all
</Proxy>
Shane Madden
  • 114,520
  • 13
  • 181
  • 251