2

I am creating a tool for Web Developers which will 'scan' the HTML on their site.

This requires my Asp.Net application to download a page from their site.

To protect from abuse, I want to make sure I obey robots.txt, among other methods.

Does HttpWebRequest do this already? Or is there an open source implementation I can use to validate a robots.txt file given a user-agent string?

Scen
  • 1,700
  • 13
  • 15

1 Answers1

5

HttpWebRequest just makes one HTTP request. It doesn't obey robots.txt because it doesn't know it exists. It would take another request to see if it exists and read its contents, and if it did that by default you 'd have 50% overhead.

For a C# implementation, have a look here: Anybody got any C# code to parse robots.txt and evaluate URLS against it

Community
  • 1
  • 1
Jon
  • 428,835
  • 81
  • 738
  • 806