Does HttpWebRequest Obey robots.txt in .Net?

Question

I am creating a tool for Web Developers which will 'scan' the HTML on their site.

This requires my Asp.Net application to download a page from their site.

To protect from abuse, I want to make sure I obey robots.txt, among other methods.

Does HttpWebRequest do this already? Or is there an open source implementation I can use to validate a robots.txt file given a user-agent string?

score 5 · Accepted Answer · edited May 23 '17 at 12:26

5

HttpWebRequest just makes one HTTP request. It doesn't obey robots.txt because it doesn't know it exists. It would take another request to see if it exists and read its contents, and if it did that by default you 'd have 50% overhead.

For a C# implementation, have a look here: Anybody got any C# code to parse robots.txt and evaluate URLS against it

edited May 23 '17 at 12:26

Community

1
1

answered Apr 01 '11 at 01:16

Jon

428,835
81
738
806

Does HttpWebRequest Obey robots.txt in .Net?

1 Answers1