0

I'd like to make a tool which accesses a search engine programatically.

I've been enjoying using YQL recently and thought it might be useful since it can dig data out of HTML pages.

But I tried it with Google, Bing, and Yahoo search and they all seem to block YQL.

I wonder if there are some lesser-known web search sites that might work with YQL.

Or actually if there's still any search engine which offers an API that would be even better.

(In fact I'm only searching linguistics.stackexchange.com because the Stack Exchange APIs don't provide a way to search by text that I can find.)

hippietrail
  • 15,848
  • 18
  • 99
  • 158

1 Answers1

1

Most search engine sites will block access from screen scrapers and other agents. YQL is designed to respect the robots.txt file, so on many sites like this it won't work.

Instead, I suggest moving a step above HTML screen scraping and using a published search API.

In YQL for example, there is a table which provides access to the Bing search results:

select * from microsoft.bing where query="soccer" and source in ("web","image")

You could also look at the Yahoo! BOSS API or using the Bing Search API directly.

hippietrail
  • 15,848
  • 18
  • 99
  • 158
BrianC
  • 10,591
  • 2
  • 30
  • 50
  • As for `microsoft.bing` it currently gives this error: `AppID is not functioning properly. Please refer to the HelpUrl to get more information.` – hippietrail Aug 31 '12 at 14:08
  • 1
    It looks like the Bing API 2.0 (which used AppID) has been deprecated as of Aug 1 2012. See the [Bing Developer Blog](http://www.bing.com/community/site_blogs/b/developer/archive/2012/05/17/bing-developer-update-2.aspx) which says "We encourage existing developers to begin transitioning to the Windows Azure Marketplace before Bing Search API 2.0 AppIDs are decommissioned on August 1, 2012. On and after this date, AppIDs will no longer return results." – BrianC Aug 31 '12 at 17:01
  • Bummer so that's Google and Bing both out for free use. Nothing left? `)-:` – hippietrail Aug 31 '12 at 17:20