0

i am searching for a way to retrieve the number of search results (like on google result pages) for a given query. the aim is to implementent the normalized google distance (http://iknowate.blogspot.com/2011/10/google-similarity-distance.html) using a search api; the main problem is that the number of requests shouldnt be too limited (google api seems to allow only ~100 queries / day).

maybe someone could give me a hint how i could retrieve this information.

MrMuh
  • 319
  • 1
  • 4
  • 13

1 Answers1

0

You could either use a third party library/class to scrape the results page and then traverse the DOM to get your info or use file_get_contents to get the page and then use preg_match to get the total number of results. Another option would be to scrape the page using CURL which would also enable you to hide your script behind multiple Agents to prevent any kind of bans if you intend to scrape pages multiple times.

  • i think an automatic scraper would be blocked after a view thousand requests per day (from same IP) due to the TOS of google; so it doesnt seem to be a long term solution for this problem. meanwhile i found the yahoo boss api http://developer.yahoo.com/search/boss/ as a possible solution (not free) – MrMuh May 31 '12 at 18:44