Questions tagged [robots.txt]

Robots.txt (the Robots Exclusion Protocol) is a text file placed in the root of a web site domain to give instructions to compliant web robots (such as search engine crawlers) about what pages to crawl and not crawl, as well as other information such as a Sitemap location. In modern frameworks it can be useful to programmatically generate the file. General questions about Search Engine Optimization are more appropriate on the Webmasters StackExchange site.

Website owners use the /robots.txt file to give instructions about their site to web robots; this is called The Robots Exclusion Protocol.

It works likes this: a robot wants to vists a website URL, say http://www.example.com/welcome.html. Before it does so, it firsts checks for http://www.example.com/robots.txt, and finds:

User-agent: *
Disallow: /

The "User-agent: *" means this section applies to all robots. The "Disallow: /" tells the robot that it should not visit any pages on the site.

There are two important considerations when using /robots.txt:

robots can ignore your /robots.txt. Especially malware robots that scan the web for security vulnerabilities, and email address harvesters used by spammers will pay no attention.
the /robots.txt file is a publicly available file. Anyone can see what sections of your server you don't want robots to use, so don't try to use /robots.txt to hide information.

More information can be found at http://www.robotstxt.org/.

1426 questions

votes

1 answer

Evidences for automatic browsing-Log file analysis

I'm not quite sure whether this is the suitable forum to post my question. I'm analyzing web server logs both in Apache and IIS log formats. I want to find the evidences for automatic browsing(Ex. Web robots,spiders,bots etc.) I used python…

security robots.txt access-log

asked Jun 30 '13 at 11:32

Nilani Algiriyage

32,876
32
87
121

votes

2 answers

Having problems understanding how to block some URLs on robot.txt

The problem is this. I have some URLs on the system I have that have this pattern http://foo-editable.mydomain.com/menu1/option2 http://bar-editable.mydomain.com/menu3/option1 I would like to indicate in the robot.txt file that they should not be…

web robots.txt googlebot

asked Jun 25 '13 at 15:55

paddingtonMike

1,441
1
21
37

votes

1 answer

Drupal Aegir - Symlinked files directory and multisite robots.txt

I'm using Aegir/Barracuda/Nginx to maintain a multisite setup. My "files" directory is symlinked to a mounted "files" directory. Therefore when I clone a site to be used for dev purposes it uses the same "files" directory. The problem with the…

nginx drupal-7 robots.txt

asked Jun 21 '13 at 18:26

Meggy

1,491
3
28
63

votes

1 answer

Googlebot and Bingbot crawling DNN site

I have a DNN site with over 20,000 pages. The Googlebot and Bingbot are consistently crawling my website. When I look at my sitelog I can see that google and bing are crawling my site via the pageid (ex: www.url.com/Default.aspx?TabID=5000) The…

dotnetnuke robots.txt googlebot bingbot

asked Jun 14 '13 at 02:53

Cesar

votes

1 answer

Can I use robots.txt to send the robots to a specific folder?

So I have a regular website and a blog in the same domain. In the future I plan on buying a domain exclusively for the blog but for now this is the way I'll do it. The blog is in the directory /blog and there are no links from the main site to the…

search-engine robots.txt

asked Jun 14 '13 at 02:12

Pier

10,298
17
67
113

votes

2 answers

Blocking URLs that contain numbers in robots.txt

My website allows search engines to index the same page in 2 formats like: ‪www.example.com/page-1271.html‬ www.example.com/page-1271-page-title.html All my site pages are like that. So, How can I block the first format in robots.txt file? I mean…

url robots.txt

asked Jun 10 '13 at 21:44

hatem tawfik

votes

2 answers

how to block multiple links in robot.txt with one line?

I have many pages whose links are as follow: http://site.com/school_flower/ http://site.com/school_rose/ http://site.com/school_pink/ etc. I can't block them manually. How could i block these kind of pages, while i have hundreds fo links of above…

robots.txt

asked Jun 07 '13 at 10:29

user2170554

votes

0 answers

Editing robot.txt files in wp

XML sitemap generator plugin simply put the following string in robot.txt file, if we see so many wp blogs they have lots of tags included in it. also my xml file looks like "sitemap.xml.gz" this, User-agent: * Disallow: /wp-admin/ Disallow:…

wordpress sitemap robots.txt

asked Jun 07 '13 at 04:23

Naruto

9,476
37
118
201

votes

1 answer

robots.txt codes for exclude several dir in one dir

I want Disallow google images to index my images in these path please let me know that am i right for this code in robots.txt. /images/otherimages/dir1/here are several images /images/otherimages/dir2/here are several images User-agent:…

robots.txt

asked Jun 04 '13 at 15:17

Kaveh

2,530
7
29
34

votes

0 answers

RewriteRule causing redirection

I have IIS 7.5 with ISAPI_Rewrite(Helicon) I'm trying to do so that the robots.txt from each hosted site will be the same. For that purpose I have one dummy site(sometestsite.com) which has robots1.txt(which I want to be reused on each other…

mod-rewrite redirect iis-7.5 robots.txt isapi-rewrite

asked May 24 '13 at 16:19

Vladimirs

8,232
4
43
79

votes

0 answers

UAT site is not searchable(crawable)

We have production and test environment as any other company. And I was thinking to put a robots.txt into the UAT root folder that Google web crawler would not do an unwarranted crawl on the uat pages. But what I found out was surprising. I do not…

.net iis web-crawler robots.txt

asked May 10 '13 at 21:02

Lost

12,007
32
121
193

votes

1 answer

Avoid robots from going into a www.domain.com/thishash when link posted to twitter, facebook

I'm building a service where people gets notified (mails) when they follow a link with the format www.domain.com/this_is_a_hash. The people that use this server can share this link on different places like, twitter, tumblr, facebook and more... The…

ruby-on-rails ruby-on-rails-3 robots.txt web-crawler

asked May 10 '13 at 18:26

Andres

11,439
12
48
87

votes

2 answers

blocked links in sitemap

i'm using a online sitemap generator tool which generates links even for which are blocked in robots.txt. Is these blocked links affect site ranking ? . Is there anyway to overcome it ?

robots.txt sitemap

asked Oct 29 '09 at 13:10

ArK

20,698
67
109
136

votes

1 answer

Disallow subdomain url using robots.txt

i would like to ask you a question... i have a domain kiosban.com and store.kiosban.com.. and i want to disallow store.kiosban.com/template/* And i have this on my store.kiosban.com/robots.txt but when i look at google webmaster tools... on health…

subdomain robots.txt

asked Apr 16 '13 at 02:41

user2070749

votes

1 answer

How to properly split a site?

Suppose I have a new verison of a website: http://www.mywebsite.com and I have would like to keep the older site in a sub-directory and treat it seperately: http://www.mywebsite.com/old/ My new site has a link to the old one on the main page,…

web sitemap robots.txt google-search-console

asked Apr 10 '13 at 16:52

Maximus

1,441
14
38

Prev 1 2 3

…

95 96 Next