At the moment I am crawling a large number of predefined sites, looking for a very small number of particular documents of interest. Importantly, I am not crawling these sites to create my own search engine: it is specifically for retrieving the documents.
All of the major search engines have an API that I don't mind paying for, but they seem to be focused on using their API to make your own search engine.
For example: Yahoo BOSS TOS at http://info.yahoo.com/legal/us/yahoo/boss/tou/ . B.1(a) says "You are permitted to use the Services only for the purpose of incorporating and displaying Results from the Services as part of a Search Product deployed on Your Offering". So I can only use it for my own search engine.
Google only has the Custom Search Engine stuff, which again is not what I need.
Bing's API seems to be closer to what I need but then it's TOS require not removing certain pieces of information etc. But then again, it doesn't require me to only use it for implementing my own search engine (from what I can see).
Am I reading too much into this or is there a search engine that allows me to essentially use the results of their crawl of certain sites instead of my own for my product? Again, the search results themselves are not my product: it's what I do with the data in the documents that is.
Thanks for any tips.