1

I am using HttpWebrequest to GET the result from google.I use proxies to get the data.now there is a strange problem that for some queries it return the data and for some it throws the exception The remote server returned an error: (503) Server Unavailable.. One might think that proxy is bad but when you put it in internet explorer then you open google it is there.no 503 error then.but httpwebrequest gives it on certain query.i.e if you intend to get

http://www.google.com/search?q=site:http://www.yahoo.com 

it would throw exception where as if you go for

http://www.google.com/search?q=info:http://www.yahoo.com

it works.

my code so far is

HttpWebRequest request = (HttpWebRequest)WebRequest.Create(file);
                request.ProtocolVersion = HttpVersion.Version11;
                request.Method = "GET";
               request.KeepAlive = false;
                request.ContentType = "text/html";
                request.Timeout = 1000000000;
                request.ReadWriteTimeout = 1000000000;
                request.UseDefaultCredentials = true;
                request.Credentials = CredentialCache.DefaultCredentials;
    Uri newUri = new Uri("http://" + proxy[selectedProxy].ProxyAddress.Trim() + "/");
                    WebProxy myProxy = new WebProxy();
                    myProxy.Credentials = CredentialCache.DefaultCredentials;
                    myProxy.Address = newUri;
                    request.Proxy = myProxy;
 WebResponse response = request.GetResponse();
                    // System.Threading.Thread.Sleep(Delay);
                    StreamReader reader = null;
                    string data = null;
                    reader = new StreamReader(response.GetResponseStream());
                        data = reader.ReadToEnd();
Afnan Bashir
  • 7,319
  • 20
  • 76
  • 138

3 Answers3

3

You are being hit with the "sorry you are a spambot message" and will need to enter the captcha to continue or to change proxy. For some reason you cannot pull the page contents by default when you get a 503 error, although if you do the same thing in the browser, the contents will be display to you.

Martin
  • 31
  • 2
2

That's weird. Maybe some url encoding issue. Try the following which should take care of properly handling everything:

using System;
using System.Net;
using System.Web;

class Program
{
    static void Main()
    {
        using (var client = new WebClient())
        {
            var newUri = new Uri("http://proxy.foo.com/");
            var myProxy = new WebProxy();
            myProxy.Credentials = CredentialCache.DefaultCredentials;
            myProxy.Address = newUri;
            client.Proxy = myProxy;

            var query = HttpUtility.ParseQueryString(string.Empty);
            query["q"] = "info:http://www.yahoo.com";
            var url = new UriBuilder("http://www.google.com/search");
            url.Query = query.ToString();
            Console.WriteLine(client.DownloadString(url.ToString()));
        }
    }
}
Darin Dimitrov
  • 1,023,142
  • 271
  • 3,287
  • 2,928
  • `HttpUtility` not visible it is a winform app – Afnan Bashir Jun 19 '11 at 21:16
  • @Lagrangian, add reference to `System.Web` and if this is a .NET 4.0 client profile just test it in a separate application using the full framework profile. I am curious to know the outcome. If this works there is a similar technique that could be used for the client profile. – Darin Dimitrov Jun 19 '11 at 21:17
  • Same Exception .I donto know why but if you replace info in the query with site it does not works.and with info it works same results as with my method so encoding is not the problem i think – Afnan Bashir Jun 19 '11 at 21:27
  • @Lagrangian, yeap, seems like you are using some crappy proxy, coz both examples work perfectly fine on my machine without a proxy. What's the contents of the 503 error page you are getting from the proxy? Does it provide some more details about why it failed? – Darin Dimitrov Jun 19 '11 at 21:29
  • i tried and works on my system without proxy too but how come internet explorer does the trick? – Afnan Bashir Jun 19 '11 at 21:30
  • @Lagrangian, hmm, good point. How about using FireBug, to capture the exact headers that are being sent and try to add those headers to the request. – Darin Dimitrov Jun 19 '11 at 21:31
0

It depends on how often you send a query to Google with the same IP address. If you send your queries to Google too fast, then Google will block you IP address. When this happens, Google returns a 503 error with a redirect to their sorry-page.

Do something like this:

try
            {
                response = (HttpWebResponse) webRequest.GetResponse();
            }
            catch (WebException ex)
            {
                using (var sr = new StreamReader(ex.Response.GetResponseStream()))
                {
                    var html = sr.ReadToEnd();
                }
            }

And when debugging, check for the value that's in the html variable. You will see that this is an HTML-page where you should fill in a captcha code

rubenj
  • 118
  • 1
  • 6