Error 500 Web Request Can't Scrape WebSite

Question

I have no problem accessing the website with a browser, but when I programmatically try to access the website for scraping, I get the following error.

The remote server returned an error: (500) Internal Server Error.

Here is the code I'm using.

using System.Net;

string strURL1 = "http://www.covers.com/index.aspx";
WebRequest req = WebRequest.Create(strURL1);

// Get the stream from the returned web response
StreamReader stream = new StreamReader(req.GetResponse().GetResponseStream());
System.Text.StringBuilder sb = new System.Text.StringBuilder();
string strLine;
// Read the stream a line at a time and place each one
while ((strLine = stream.ReadLine()) != null)
{
  if (strLine.Length > 0)
    sb.Append(strLine + Environment.NewLine);
}

stream.Close();

This one has me stumped. TIA

score 6 · Accepted Answer · answered Jan 04 '15 at 00:41

6

Its the user agent.

Many sites like the one you're attempting to scrape will validate the user agent string in an attempt to stop you from scraping them. Like it has with you, this quickly stops junior programmers from attempting the scrape. Its not really a very solid way of stopping a scrape - but it stumps some people.

Setting the User-Agent string will work. Change the code to:

HttpWebRequest req = (HttpWebRequest)WebRequest.Create(strURL1);
req.UserAgent = "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2228.0 Safari/537.36"; // Chrome user agent string

..and it will be fine.

answered Jan 04 '15 at 00:41

Simon Whitehead

63,300
9
114
138

2

That fixed it. They couldn't stop this "junior programmer". I have access to StackOverflow. – Trey Balut Jan 04 '15 at 02:44
1

You can find your user agent here: https://www.whoishostingthis.com/tools/user-agent/ – Ralph Oct 03 '18 at 06:43

score 2 · Answer 2 · answered Jan 04 '15 at 00:42

It looks like it's doing some sort of user-agent checking. I was able to replicate your problem in PowerShell, but I noticed that the PowerShell cmdlet Invoke-WebRequest was working fine.

So I hooked up Fiddler, reran it, and stole the user-agent string out of Fiddler.

Try to set the UserAgent property to: User-Agent: Mozilla/5.0 (Windows NT; Windows NT 6.2; en-US) WindowsPowerShell/4.0

Error 500 Web Request Can't Scrape WebSite

2 Answers2

Linked