Questions tagged [robots.txt]

Robots.txt (the Robots Exclusion Protocol) is a text file placed in the root of a web site domain to give instructions to compliant web robots (such as search engine crawlers) about what pages to crawl and not crawl, as well as other information such as a Sitemap location. In modern frameworks it can be useful to programmatically generate the file. General questions about Search Engine Optimization are more appropriate on the Webmasters StackExchange site.

Website owners use the /robots.txt file to give instructions about their site to web robots; this is called The Robots Exclusion Protocol.

It works likes this: a robot wants to vists a website URL, say http://www.example.com/welcome.html. Before it does so, it firsts checks for http://www.example.com/robots.txt, and finds:

User-agent: *
Disallow: /

The "User-agent: *" means this section applies to all robots. The "Disallow: /" tells the robot that it should not visit any pages on the site.

There are two important considerations when using /robots.txt:

robots can ignore your /robots.txt. Especially malware robots that scan the web for security vulnerabilities, and email address harvesters used by spammers will pay no attention.
the /robots.txt file is a publicly available file. Anyone can see what sections of your server you don't want robots to use, so don't try to use /robots.txt to hide information.

More information can be found at http://www.robotstxt.org/.

1426 questions

votes

4 answers

How can i fix "Googlebot can't access your site" issue?

I just keep getting a message about "Over the last 24 hours, Googlebot encountered 1 errors while attempting to access your robots.txt. To ensure that we didn't crawl any pages listed in that file, we postponed our crawl. Your site's overall…

search gwt robots.txt

asked Aug 18 '14 at 03:11

Jason

votes

3 answers

How to add route to dynamic robots.txt in ASP.NET MVC?

I have a robots.txt that is not static but generated dynamically. My problem is creating a route from root/robots.txt to my controller action. This works: routes.MapRoute( name: "Robots", url: "robots", defaults: new { controller = "Home", action =…

asp.net-mvc asp.net-mvc-routing robots.txt

asked Jun 18 '13 at 04:49

JSS

votes

1 answer

Any reason to not do a 301 on favicon.ico, apple-touch-icon, and robots.txt?

I would like to redirect requests for these resources to my CDN. Is there any reason to not do this?

favicon robots.txt apple-touch-icon

asked Apr 04 '12 at 18:46

John Bachir

22,495
29
154
227

votes

3 answers

block google robots for URLS containing a certain word

my client has a load of pages which they dont want indexed by google - they are all called http://example.com/page-xxx so they are /page-123 or /page-2 or /page-25 etc Is there a way to stop google indexing any page that starts with /page-xxx…

robots.txt

asked Jul 28 '11 at 13:03

JorgeLuisBorges

votes

2 answers

Excluding testing subdomain from being crawled by search engines (w/ SVN Repository)

I have: domain.com testing.domain.com I want domain.com to be crawled and indexed by search engines, but not testing.domain.com The testing domain and main domain share the same SVN repository, so I'm not sure if separate robots.txt files would…

.htaccess mod-rewrite web-crawler robots.txt google-search-console

asked Jul 18 '11 at 20:18

Eric

votes

3 answers

Anybody got any C# code to parse robots.txt and evaluate URLS against it

Short question: Has anybody got any C# code to parse robots.txt and then evaluate URLS against it so see if they would be excluded or not. Long question: I have been creating a sitemap for a new site yet to be released to google. The sitemap has two…

c# robots.txt

asked Mar 11 '09 at 05:47

Simon_Weaver

140,023
84
646
689

votes

1 answer

Regexp for robots.txt

I am trying to set up my robots.txt, but I am not sure about the regexps. I've got four different pages all available in three different languages. Instead of listing each page times 3, I figured I could use a regexp. nav.aspx page.aspx/changelang…

regex robots.txt

asked Jun 10 '11 at 13:05

patad

9,364
11
38
44

votes

4 answers

Unable to map route for robots.txt in asp.net mvc

I am developing an asp.net mvc application. I am creating robots.txt for my application to prevent from bots because my current site is getting many robot requests. So I found this link, Robots.txt file in MVC.NET 4 to create robots.txt. But I when…

asp.net-mvc routes asp.net-mvc-routing robots.txt

asked Aug 07 '16 at 05:58

Wai Yan Hein

13,651
35
180
372

votes

3 answers

What does "Allow: /$" mean in robots.txt

When digging through a Google robots.txt file I noticed a line that I was not familiar with. What does the below code mean in the context of a robots.txt file? Allow: /$ Does the '$' change the meaning any from simply saying Allow: /

web-crawler robots.txt

asked Jan 19 '16 at 14:37

Kyle Piira

votes

4 answers

Googlebots Ignoring robots.txt?

I have a site with the following robots.txt in the root: User-agent: * Disabled: / User-agent: Googlebot Disabled: / User-agent: Googlebot-Image Disallow: / And pages within this site are getting scanned by Googlebots all day long. Is there…

robots.txt googlebot

asked Dec 05 '08 at 18:08

Tim Scott

15,106
9
65
79

votes

1 answer

Disallow certain page directories but NOT that page itself

Let's say, I have a dynamic page that creates URL's from user inputs. For example: www.XXXXXXX.com/browse <-------- (Browse being the page) Every time user enters some query, it generates more pages. For example: www.XXXXXXX.com/browse/abcd…

robots.txt

asked Dec 25 '15 at 20:29

Raj Sandhu

votes

4 answers

Where to put robots.txt file?

Where should put robots.txt? domainname.com/robots.txt or domainname/public_html/robots.txt I placed the file in domainname.com/robots.txt, but it's not opening when I type this in browser. alt text…

seo web-hosting robots.txt

asked Jun 06 '10 at 14:49

Jitendra Vyas

148,487
229
573
852

votes

3 answers

robots.txt allow all except few sub-directories

I want my site to be indexed in search engines except few sub-directories. Following are my robots.txt settings: robots.txt in the root directory User-agent: * Allow: / Separate robots.txt in the sub-directory (to be excluded) User-agent:…

seo search-engine cpanel robots.txt shared-hosting

asked Feb 13 '15 at 09:09

Kunwarbir S.

votes

3 answers

Need to block subdomain using robots.txt which is on same directory level

I have one problem I have domain name for example www.testing.com and new.testing.com so i do not want to new.testing.com display in any search engine. I have added one robots.txt to the new.testing.com. And both site has same parent…

seo robots.txt

asked Sep 19 '14 at 07:58

Jalpesh Patel

3,150
10
44
68

votes

2 answers

Block a site from search engine - DuckDuckGo

I have a development site https://text-domain.com. (not a real site) When I go to https://duckduckgo.com and search for text-domain.com, it does return results. What have I tried so far: Created robots.txt file with following code(put in in my root…

web-crawler robots.txt duckduckgo

asked Aug 06 '13 at 12:03

Vimalnath

6,373
2
26
47

Prev 1 2 3

…

95 96 Next