Questions tagged [robots.txt]

Robots.txt (the Robots Exclusion Protocol) is a text file placed in the root of a web site domain to give instructions to compliant web robots (such as search engine crawlers) about what pages to crawl and not crawl, as well as other information such as a Sitemap location. In modern frameworks it can be useful to programmatically generate the file. General questions about Search Engine Optimization are more appropriate on the Webmasters StackExchange site.

Website owners use the /robots.txt file to give instructions about their site to web robots; this is called The Robots Exclusion Protocol.

It works likes this: a robot wants to vists a website URL, say http://www.example.com/welcome.html. Before it does so, it firsts checks for http://www.example.com/robots.txt, and finds:

User-agent: *
Disallow: /

The "User-agent: *" means this section applies to all robots. The "Disallow: /" tells the robot that it should not visit any pages on the site.

There are two important considerations when using /robots.txt:

robots can ignore your /robots.txt. Especially malware robots that scan the web for security vulnerabilities, and email address harvesters used by spammers will pay no attention.
the /robots.txt file is a publicly available file. Anyone can see what sections of your server you don't want robots to use, so don't try to use /robots.txt to hide information.

More information can be found at http://www.robotstxt.org/.

1426 questions

219

votes

3 answers

Can a relative sitemap url be used in a robots.txt?

In robots.txt can I write the following relative URL for the sitemap file? sitemap: /sitemap.ashx Or do I have to use the complete (absolute) URL for the sitemap file, like: sitemap: http://subdomain.domain.com/sitemap.ashx Why I wonder: I own a…

seo sitemap robots.txt

asked Jan 07 '13 at 13:16

Easyrider

3,199
5
22
32

161

votes

5 answers

How to configure robots.txt to allow everything?

My robots.txt in Google Webmaster Tools shows the following values: User-agent: * Allow: / What does it mean? I don't have enough knowledge about it, so looking for your help. I want to allow all robots to crawl my website, is this the right…

robots.txt

asked Nov 25 '10 at 12:16

Raajpoot

1,611
2
10
3

101

votes

10 answers

Static files in Flask - robot.txt, sitemap.xml (mod_wsgi)

Is there any clever solution to store static files in Flask's application root directory. robots.txt and sitemap.xml are expected to be found in /, so my idea was to create routes for them: @app.route('/sitemap.xml', methods=['GET']) def sitemap(): …

python flask static mod-wsgi robots.txt

asked Nov 21 '10 at 19:26

biesiad

2,258
4
19
16

votes

3 answers

Ignore URLs in robot.txt with specific parameters?

I would like Google to ignore URLs like this: http://www.mydomain.example/new-printers?dir=asc&order=price&p=3 In other words, all the URLs that have the parameters dir, order and price should be ignored. How do I do so with robots.txt?

seo robots.txt

asked Feb 05 '12 at 13:55

Luis Valencia

32,619
93
286
506

votes

9 answers

What is the smartest way to handle robots.txt in Express?

I'm currently working on an application built with Express (Node.js) and I want to know what is the smartest way to handle different robots.txt for different environments (development, production). This is what I have right now but I'm not convinced…

node.js express robots.txt

asked Feb 27 '13 at 18:41

Vinch

1,551
3
13
15

votes

5 answers

How to stop Google indexing my Github repository

I use Github to store the text of one of my web sites, but the problem is Google indexing the text in Github as well. So the same text will show up both on my site and on Github. e.g. this search The top hit is my site. The second hit is the Github…

github robots.txt

asked Apr 05 '13 at 22:51

szabgab

6,202
11
50
64

votes

9 answers

Stop Google from indexing

Is there a way to stop Google from indexing a site?

meta-tags robots.txt google-index

asked Dec 23 '08 at 23:29

Developer

17,809
26
66
92

votes

4 answers

robots.txt to disallow all pages except one? Do they override and cascade?

I want one page of my site to be crawled and no others. Also, if it's any different than the answer above, I would also like to know the syntax for disallowing everything but the root (index) of the website is. # robots.txt for…

robots.txt

asked Nov 08 '13 at 21:41

nouveau

1,162
1
8
14

votes

2 answers

robots.txt allow root only, disallow everything else?

I can't seem to get this to work but it seems really basic. I want the domain root to be crawled http://www.example.com But nothing else to be crawled and all subdirectories are dynamic http://www.example.com/* I tried User-agent: * Allow:…

robots.txt

asked Aug 29 '11 at 05:31

cotopaxi

votes

1 answer

robots.txt and .htaccess syntax highlight

Is there a way to colorcode/highlight robots.txt and .htaccess syntax? E.g. with a SublimeText2 plug-in. I found this, but can't figure out how to install it: https://github.com/shellderp/sublime-robot-plugin

.htaccess sublimetext2 robots.txt

asked Jan 04 '13 at 18:39

Geo

12,666
4
40
55

votes

5 answers

Multiple Sitemap: entries in robots.txt?

I have been searching around using Google but I can't find an answer to this question. A robots.txt file can contain the following line: Sitemap: http://www.mysite.com/sitemapindex.xml but is it possible to specify multiple sitemap index files in…

sitemap robots.txt

asked Apr 07 '10 at 16:31

user306942

votes

6 answers

What is the use of the hackers.txt file?

First No I am not asking you to teach me hacking, I am just curious about this file and its content. My journey When I dived into the new HTML5 Boilerplate I came accross the humans.txt. I googled for it and I came at this site…

robots.txt html5boilerplate

asked Mar 12 '13 at 10:11

Ron van der Heijden

14,803
7
58
82

votes

3 answers

How do I disallow specific page from robots.txt

I am creating two pages on my site that are very similar but serve different purposes. One is to thank users for leaving a comment and the other is to encourage users to subscribe. I don't want the duplicate content but I do want the pages to be…

robots.txt

asked Aug 15 '10 at 06:29

Daniel

6,758
6
31
29

votes

10 answers

Ethics of robots.txt

I have a serious question. Is it ever ethical to ignore the presence of a robots.txt file on a website? These are some of the considerations I've got in mind: If someone puts a web site up they're expecting some visits. Granted, web crawlers are…

robots.txt

asked Jun 16 '09 at 00:02

Onorio Catenacci

14,928
14
81
132

votes

2 answers

django serving robots.txt efficiently

Here is my current method of serving robots.txt url(r'^robots\.txt/$', TemplateView.as_view(template_name='robots.txt', content_type='text/plain')), I don't think that this is the best way. I think it…

python django robots.txt

asked Aug 24 '13 at 23:41

Lucas Ou-Yang

5,505
13
43
62

2 3

…

95 96 Next