0

I am trying to create a windows service. The purpose of service is to pick up urls from a database and check their page rank from google. The purpose is to catch any one faking their page ranks. I found some code at http://www.codeproject.com/KB/aspnet/Google_Pagerank.aspx and used it.

Now here is the code

  public static int GetPageRank()
    {

        string file = "http://toolbarqueries.google.com/search?q=info:codeproject.com";
        try
        {
            //Request PR from Google



            WebRequest request = WebRequest.Create(file);
            WebResponse response = request.GetResponse();

            StreamReader reader = new StreamReader(response.GetResponseStream());
            string data = reader.ReadToEnd();

            reader.Close();
            response.Close();

            //Parse PR from string
            int pageRank = -1;
            if (data.IndexOf(':') != -1)
            {
                data = data.Substring(data.LastIndexOf(':') + 1);
            }

            int.TryParse(data, out pageRank);

            return pageRank;
        }
        catch (Exception ex)
        {

            MessageBox.Show(ex.Message);
            return -1;
        }
    }

Now what is happening is this when this method is called after some tries like 100 tries i start getting following exception. "The remote server returned an error: (503) Server Unavailable". I have done some research and i have seen a related question on stack overflow as well. Apparently google stops serving requests if to many of them originate from a same ip. Are there any work arounds to it that will enable me to check several thousand pageranks in say two hours or three hours.

Syed Salman Akbar
  • 767
  • 2
  • 8
  • 19

2 Answers2

0

Are there any work arounds to it that will enable me to check several thousand pageranks in say two hours or three hours[?]

Nope. You're simply requesting too much data. There might be a JSON or XML API to get batch responses, but I am not aware of any from Google.

komiga
  • 363
  • 1
  • 6
  • 13
0

Finally what we did was get proxies from a proxy provider and use them. Had to use a semaphore so that all the threads would be assigned a new proxy while ensuring that a proxy is not used more that 3 times a minute and proxies are rotated in circular sequential manner. There is no other work around to this.

Syed Salman Akbar
  • 767
  • 2
  • 8
  • 19