0

I would like to scrape a web site. It has the following in it's robots.txtfile, but I'm not exactly sure what it is they don't want me to do:

User-agent: *
Disallow: /click

There is no click subdirectory. Or they don't want me to access anything that would normally require clicking (like submitting data via a form)? They sure aren't making it easy in any case - the main page's form GETS to a site that sets a cookie that is read by a third page.

user984003
  • 28,050
  • 64
  • 189
  • 285

1 Answers1

2

It means that no bot should crawl any URLs whose paths start with the string click.

For example, the following URLs should be blocked:

  • example.com/click
  • example.com/click.html
  • example.com/click/
  • example.com/click/foo/bar
  • example.com/clicker

The following URLs would still be allowed:

  • example.com/foo/click
  • example.com/fooclick
  • example.com/clic

You can find the original robots.txt specification at http://www.robotstxt.org/wc/robots.html.

unor
  • 92,415
  • 26
  • 211
  • 360