11

I'm trying to figure out how I can verify what I'm feeding into carrierwave is actually an image. The source I'm getting my image urls from isn't giving me back all live urls. Some of the images no longer exist. Unfortunately it doesn't really return the right status codes or anything because I was using some code to check if the remote file exists and it was passing that check. So now just to be on the safe side I'd like a way to verify i'm getting back a valid image file before I go ahead and download it.

Here is the remote file checking code I was using just for reference but I'd prefer something that actually can identify that the files are images.

require 'open-uri'
require 'net/http'

def remote_file_exists?(url)
    url = URI.parse(url)
    Net::HTTP.start(url.host, url.port) do |http|
      return http.head(url.request_uri).code == "200"
    end
end
hadees
  • 1,754
  • 2
  • 25
  • 36

3 Answers3

11

I would check to see if the service returns the proper mime types in the Content-Type HTTP header. (here's a list of mime types)

For example, the Content-Type of the StackOverflow homepage is text/html; charset=utf-8, and the Content-Type of your gravatar image is image/png

To check the Content-Type header for image in ruby using Net::HTTP, you would use the following:

def remote_file_exists?(url)
    url = URI.parse(url)
    Net::HTTP.start(url.host, url.port) do |http|
      return http.head(url.request_uri)['Content-Type'].start_with? 'image'
    end
end
Rick Button
  • 1,212
  • 13
  • 19
  • I ended up checking the status code first just to make sure it was 200 then using your code. So something like `return (head.code == "200") ? head['Content-Type'].start_with?('image') : false` – hadees Mar 06 '12 at 02:42
  • That should work. You should also ask the service provider why they return a 200 status code when the resource doesn't actually exist. – Rick Button Mar 06 '12 at 03:31
  • Yeah that is a good question but I'm guessing they aren't technical enough to answer it. Also I don't have direct access to them. Hopefully this will work for most cases. – hadees Mar 06 '12 at 03:52
  • Headers can be spoofed rather easily in many cases, so relying solely on the mime-type seems somewhat dangerous and possibly insecure. It might make sense to use ruby-filemagic to check the type by magic number instead or at least additionally. – jaydel Jul 29 '12 at 19:58
  • From his most recent comment (replying to my comment) he seems to be using a single source for his images. If he was crawling a lot of sites, I would not trust it, but if it is one trusted source then he should be ok. – Rick Button Jul 29 '12 at 22:15
  • 1
    That solution works but a shorter version is http://stackoverflow.com/a/6400803/365950 – Zubin Aug 14 '12 at 03:06
9

Rick Button's answer worked for me but I needed to add SSl support:

def self.remote_image_exists?(url)
  url = URI.parse(url)
  http = Net::HTTP.new(url.host, url.port)
  http.use_ssl = (url.scheme == "https")

  http.start do |http|
    return http.head(url.request_uri)['Content-Type'].start_with? 'image'
  end
end
machineboy2045
  • 349
  • 4
  • 4
5

I ended up using HTTParty for this. The .net request answer from Rick Button kept timing out.

  def remote_file_exists?(url)
    response = HTTParty.get(url)
    response.code == 200 && response.headers['Content-Type'].start_with? 'image'
  end

https://github.com/jnunemaker/httparty

Phil Sturgeon
  • 30,637
  • 12
  • 78
  • 117
Hans Hauge
  • 199
  • 2
  • 6