0

I am using HtmlAgilityPack to get title, description and images from an url content. Everything works fine except of getting images. Sometime image urls return just blank image. I created a test method in order to find out if the image exists:

var request = (HttpWebRequest)WebRequest.Create(imageUrl);
request.Credentials = CredentialCache.DefaultCredentials;
request.Method = "HEAD";
var response = (HttpWebResponse)request.GetResponse();
if (response.StatusCode == HttpStatusCode.OK)
//do something

but sometimes I just get a blank image. The response is fine, I get correct HttpStatusCode and ContentType="image/png" or ContentType="image/jpg". I've the same when I navigate to that image url through a web browser. I was thinking about getting images with minimum length but it is bad idea. Does anybody know how to "exclude" such blank images?

sergiogarciadev
  • 2,061
  • 1
  • 21
  • 35
niao
  • 4,972
  • 19
  • 66
  • 114

1 Answers1

0

First, check if you are using the correct method, because HEAD just get the headers and no actual content. You should use GET.

Also, you said you get the same blank image sometimes using the browser, if so maybe the site you are parsing and getting the images is preventing hotlinking of the images in their site.

Hotlink preventing is done by checking the Referer of the image (the site which contains the image) and when you put the image URL in the navigator it is empty.

You can send correct Referer to the site when you are downloading the image and then you probably will get the correct image as shown here:

var request = (HttpWebRequest)WebRequest.Create(imageUrl);
request.Credentials = CredentialCache.DefaultCredentials;
request.Method = "GET";
request.Referer = urlOfThePageYouJustParsed;
var response = (HttpWebResponse)request.GetResponse();
if (response.StatusCode == HttpStatusCode.OK)
//do something
Software Engineer
  • 3,906
  • 1
  • 26
  • 35
sergiogarciadev
  • 2,061
  • 1
  • 21
  • 35
  • I tried your solution but it does not seem to work unfortunately. – niao May 14 '14 at 16:42
  • Maybe the host are expecting the cookies from sent to the page you have requested first. Please check this response [How to add cookies to WebRequest?](http://stackoverflow.com/questions/11164275/how-to-add-cookies-to-webrequest) and see if it works now. – sergiogarciadev May 14 '14 at 16:47