0

Maybe this is a stupid question, but...

I am working with this company and they said they needed to get "permission" to crawl other people's sites. They have a Google Search Appliance And some Google Minis and want to point them at other sites to aggregate content. The end result will be something like a targeted search engine. (All the indexed sites relate to a specific topic)

The only thing they will be doing is:

  1. Indexing Content from the other sites/domains
  2. Providing search functionality on their own site that searches the indexed content (like Google, displaying summaries and not the full content)
  3. The search results will provide links back to the original content

Their intent is not malicious in nature, and is to provide a single site/resource for people to reference on their given topic.

Is there anything illegal or fishy about this process?

John B
  • 171
  • 1
  • 11

1 Answers1

1

It should be fine as long as your crawling is respecting the robots.txt file of the sites.

Searching google for robots.txt will give you a lot of information.

Briefly, it is a file for specifying how a crawler/robot accesses the site, allowed and disallowed content, access rate, time of day, etc.

ManiacZX
  • 1,656
  • 13
  • 16
  • Right, robots.txt contains the "rules" for indexing a site. Not that I would, but is there anything "official" saying I have to respect that file? (Other than not wanting to be a dick) – John B Jul 23 '10 at 20:56
  • You can't be forced to use it as the implementation is on the crawler side, not the server side. As far as legal, not that I have heard of, but I've never investigated it. AFAIK it is purely a community created standard to help everyone play nice with each other. – ManiacZX Jul 23 '10 at 22:07
  • Ignoring the "robots.txt" IMO is crossing the line into unethical behavior that is probably in "abuse" territory. Think of it as having permission to use my pool, then abusing that permission by bringing a rowdy party that trashes my house. If you do it to the wrong entity (government, bank, etc), you'll probably could be in legal trouble. – duffbeer703 Jul 26 '10 at 16:23