Questions tagged [robots.txt]

Robots.txt (the Robots Exclusion Protocol) is a text file placed in the root of a web site domain to give instructions to compliant web robots (such as search engine crawlers) about what pages to crawl and not crawl, as well as other information such as a Sitemap location. In modern frameworks it can be useful to programmatically generate the file. General questions about Search Engine Optimization are more appropriate on the Webmasters StackExchange site.

Website owners use the /robots.txt file to give instructions about their site to web robots; this is called The Robots Exclusion Protocol.

It works likes this: a robot wants to vists a website URL, say http://www.example.com/welcome.html. Before it does so, it firsts checks for http://www.example.com/robots.txt, and finds:

User-agent: *
Disallow: /

The "User-agent: *" means this section applies to all robots. The "Disallow: /" tells the robot that it should not visit any pages on the site.

There are two important considerations when using /robots.txt:

robots can ignore your /robots.txt. Especially malware robots that scan the web for security vulnerabilities, and email address harvesters used by spammers will pay no attention.
the /robots.txt file is a publicly available file. Anyone can see what sections of your server you don't want robots to use, so don't try to use /robots.txt to hide information.

More information can be found at http://www.robotstxt.org/.

1426 questions

votes

2 answers

Defaults for robots meta tag

If I don't specify a robots meta tag in the head of the document, the defaults are: My question is, if I only specify "noindex", is the default still "follow"? So if I specify this below, is the default…

asked Jun 11 '14 at 14:30

Matthew Williams

votes

2 answers

Ban robots from website

my website is often down because a spider is accessying to many resources. This is what the hosting told me. They told me to ban these IP address: 46.229.164.98 46.229.164.100 46.229.164.101 But I've no idea about how to do this. I've googled a bit…

bots robots.txt web-crawler

asked May 13 '14 at 12:53

testermaster

1,031
6
21
40

votes

1 answer

Wildcards in robots.txt

If in WordPress website I have categories in this order: -Parent --Child ---Subchild I have permalinks set to: %category%/%postname% Let use an example. I create post with post name "Sport game". It's tag is sport-game. It's full url is:…

web-crawler robots.txt

asked Mar 02 '14 at 22:37

user3238424

votes

2 answers

how to disallow all dynamic urls robots.txt

how to disallow all dynamic urls in robots.txt Disallow: /?q=admin/ Disallow: /?q=aggregator/ Disallow: /?q=comment/reply/ Disallow: /?q=contact/ Disallow: /?q=logout/ Disallow: /?q=node/add/ Disallow: /?q=search/ Disallow:…

robots.txt

asked Sep 29 '09 at 22:49

pmarreddy

votes

1 answer

Allow only Google CSE and disallow Google standard search in ROBOTS.txt

I have a site that I am using a Google Custom Search Engine on. I want Google CSE to crawl my site but I want it to stay out of the results of a regular Google search. I put this in my robots.txt file hoping that google CSE bots would ignore it…

robots.txt google-custom-search

asked Jan 21 '13 at 16:26

Bender

votes

8 answers

how to ban crawler 360Spider with robots.txt or .htaccess?

I've got a problems because of 360Spider: this bot makes too many requests per second to my VPS and slows it down (the CPU-usage becomes 10-70%, but usually i have 1-2%). I looked into httpd logs and saw there such lines: 182.118.25.209 - -…

.htaccess search-engine web-crawler bots robots.txt

asked Sep 06 '12 at 17:01

kovpack

4,905
8
38
55

votes

1 answer

how to restrict the site from being indexed

I know this question was being asked many times but I want to be more specific. I have a development domain and moved the site there to a subfolder. Let's say from: http://www.example.com/ To: http://www.example.com/backup So I want the subfolder…

.htaccess search indexing robots.txt

asked May 26 '12 at 10:38

Ilian Andreev

1,071
3
12
18

votes

4 answers

robots.txt content itself is indexed?

The contents of my robots.txt file are actually itself indexed and show up in Google search results. It's only Google and not Yahoo for example. I really think Google should understand not to index the contents of my robots file as it's only there…

robots.txt google-crawlers google-index

asked Nov 08 '11 at 10:13

michael

votes

3 answers

robots.txt: user-agent: Googlebot disallow: / Google still indexing

Look at the robots.txt of this site: fr2.dk/robots.txt The content is: User-Agent: Googlebot Disallow: / That ought to tell google not to index the site, no? If true, why does the site appear in google searches?

robots.txt googlebot google-index

asked Jan 22 '11 at 16:50

Anders

votes

4 answers

Googlebot not respecting Robots.txt

For some reason when I check on Google Webmaster Tool's "Analyze robots.txt" to see which urls are blocked by our robots.txt file, it's not what I'm expecting. Here is a snippet from the beginning of our file: Sitemap:…

robots.txt googlebot

asked Jan 20 '09 at 23:40

Andrew

votes

3 answers

Why am I getting a 403 for Google AdSense on my verified site?

AdSense shows that it is verified. I have waited about 10 hours and even the placeholder for ads is not appearing. AdSense does not show any Policy violations, Crawler errors, or messages. I found this while inspecting the headers for the adsense…

ads http-status-code-403 adsense robots.txt google-ads-api

asked Apr 30 '17 at 02:30

Dshiz

3,099
3
26
53

votes

2 answers

Prevent API Gateway from receiving requests for a robots.txt file

I've been working on a new project that leverages an API Gateway mapped to a lambda function. The lambda function contains a Kestrel .NET web server that receives requests via proxy through API Gateway. I have remapped API Gateway to an actual…

amazon-web-services robots.txt aws-api-gateway

asked Apr 05 '17 at 19:25

I. Buchan

votes

2 answers

robots.txt in Laravel

I just was wondering if the robots.txt file is supposed to work like general robots txt files. So, you type for example "disallow/admin/*" place it into the the root Laravel folder and that's it. Is it like this ?

php laravel-5 robots.txt

asked Dec 16 '16 at 15:20

rolfo85

votes

2 answers

Robots.txt not working

I have used robots.txt to restrict one of the folders in my site. The folder consists of the sites in under construction. Google has indexed all those sites which are in testing phase. So I used robots.txt. I first submitted the site and robots.txt…

robots.txt web-crawler

asked Sep 08 '10 at 03:45

user75472

1,277
4
28
53

votes

1 answer

Robots.txt, disallow multilanguage URL

I have a public page that is not supposed be possible for users to sign into. So I have a url that there is no link to and you have to enter manually and then sign in. The url is multilanguage however, so it can be "/SV/Account/Logon" or…

asp.net-mvc seo robots.txt

asked Sep 03 '10 at 08:26

Oskar Kjellin

21,280
10
54
93

Prev 1 2 3

…

95 96 Next