4

It seems surprisingly difficult to run search queries programmatically via an API against the major engines.

  • Google doesn't have a general purpose API for its search, apparently and surprisingly. They have a "custom search engine" which is designed to for adding a Google-powered search box to a given site and to return results only from a couple of domains. Their signup page demands the entry of the sites to search. I tried entering ".google.com/" and some variations here, but that's not giving me the same results (in particular no hits when the web search is giving me results) on some obscure terms that I care about.
  • Bing search does have an API, but the API doesn't report the total number of hits, unlike their web results. Getting the total number of results is a requirement for my application.
  • DuckDuckGo has an API, but it doesn't seem to query the same database as the web search.
  • Blekko has an API, but it's rate limited at 1 request/second. I haven't tried asking what they pricing structure is.
  • I haven't tried Yahoo.

Note that I'm happy and willing to pay for this, but still I can't find a service. Any help is appreciated.

Kijewski
  • 25,517
  • 12
  • 101
  • 143
Wolfram Arnold
  • 7,159
  • 5
  • 44
  • 64

2 Answers2

2

The blekko API is free up to 1 query per second. Depending on what you're doing, you might find that we allow you to do things you can't do elsewhere. See http://help.blekko.com/index.php/advanced-search-features/ for some details. Contact apiauth@blekko.com for an API Auth key and documentation.

Greg Lindahl
  • 477
  • 3
  • 13
1

After more research and experimentation, I can say that:

  • The Yahoo Boss Search API works the best. For general web search, they charge $0.80/1000 calls, from the first call on. The API returns JSON, including the total number of results. It seems to have the same coverage as the web site, and is easy enough to use, but they need the requests signed with OAuth (no token required, just a signature); there is some sample Ruby code that did the trick and can be adapted to use the OAuth gem. Each request is limited to max. 50 results and comes with pagination flags to retrieve more results with separate queries (each of which gets billed).
  • The Google Custom Search API, once you get past the setup screen (just fill in google.com in the sites to search), has a dashboard setting to permit toggling "general" web search plus the custom sites. The API also returns JSON, only requires an API key in the request (no token nor signature) and returns 10 results max, but also the total number of results. The charge $5 for 5000 API calls, after 100 free calls. Perhaps most frustratingly, the Custom Search API appears to use a different database and doesn't return the same results on the web; many queries that do get hits on the web come up blank on the API.
  • The Bing API isn't really worth the trouble. I think also Yahoo is using Bing at the backend, but the Yahoo interface is easier to use and more complete, and the docs are better. Bing's API docs are downloadable Word documents (!!!) even though their search offering is now part of the Azure Cloud. Their site navigation is the most obscure.

Update: Greg Lindahl at Blekko responded to use personally and invited us for a meeting. They were very accomodating in sharing some of their data and also gave us an API key. That's another good option, especially for specialty data sets, as was the need in our case.

Wolfram Arnold
  • 7,159
  • 5
  • 44
  • 64