Questions tagged [robots.txt]

Robots.txt (the Robots Exclusion Protocol) is a text file placed in the root of a web site domain to give instructions to compliant web robots (such as search engine crawlers) about what pages to crawl and not crawl, as well as other information such as a Sitemap location. In modern frameworks it can be useful to programmatically generate the file. General questions about Search Engine Optimization are more appropriate on the Webmasters StackExchange site.

Website owners use the /robots.txt file to give instructions about their site to web robots; this is called The Robots Exclusion Protocol.

It works likes this: a robot wants to vists a website URL, say http://www.example.com/welcome.html. Before it does so, it firsts checks for http://www.example.com/robots.txt, and finds:

User-agent: *
Disallow: /

The "User-agent: *" means this section applies to all robots. The "Disallow: /" tells the robot that it should not visit any pages on the site.

There are two important considerations when using /robots.txt:

robots can ignore your /robots.txt. Especially malware robots that scan the web for security vulnerabilities, and email address harvesters used by spammers will pay no attention.
the /robots.txt file is a publicly available file. Anyone can see what sections of your server you don't want robots to use, so don't try to use /robots.txt to hide information.

More information can be found at http://www.robotstxt.org/.

1426 questions

votes

2 answers

Multiple User Agents in Robots.txt

In robots.txt file I have following sections User-Agent: Bot1 Disallow: /A User-Agent: Bot2 Disallow: /B User-Agent: * Disallow: /C Will statement Disallow:c be visible to Bot1 & Bot2 ?

seo robots.txt

asked Feb 05 '13 at 14:52

GoodSp33d

6,252
4
35
67

votes

1 answer

FastAPI, robots.txt and noindex

Does FastAPI need robots.txt and the tag noindex? I am creating business api app which shouldn't be called by anonymous. So I wonder whether I have to prepare robots.txt and the tag noindex in order to avoid any crawler's action or not. I made…

fastapi robots.txt noindex

asked Jul 19 '22 at 10:11

tomo

votes

1 answer

Java robots.txt parser with wildcard support

I'm looking for a robots.txt parser in Java, which supports the same pattern matching rules as the Googlebot. I've found some librairies to parse robots.txt files, but none of them supports Googlebot-style pattern matching : Heritrix (there is an…

java web-applications wildcard robots.txt

asked Aug 30 '11 at 12:33

clement

votes

1 answer

Should sitemap be disallowed in robots.txt? and robot.txt itself?

This a very basic question, but I can't find a direct answer anywhere online. When searching for my website on google, sitemap.xml and robots.txt are returned as search results (amongst more useful results). To prevent this should I add the…

indexing sitemap robots.txt

asked Jul 01 '11 at 18:48

RLJ

votes

1 answer

"Lighthouse was unable to download a robots.txt file" despite the file being accessible

I have a NodeJS/NextJS app running at http://www.schandillia.com. The project has a robots.txt file accessible at http://www.schandillia.com/robots.txt. As of now, the file is bare-bones for testing purposes: User-agent: * Allow: / However, when I…

node.js robots.txt content-security-policy next.js lighthouse

asked Jun 04 '19 at 10:07

TheLearner

2,813
5
46
94

votes

1 answer

React router v4 serve static file (robot.txt)

How can I put my robots.txt file to the path www.domain.com/robots.txt? No server is used, its only frontend with react router. robots.txt --> in root folder ./ app.js --> in src folder ./src/ (...) export class App extends React.Component { …

javascript reactjs react-router robots.txt

asked Mar 13 '19 at 17:45

Jeremie_Highway

votes

2 answers

Robots.txt: Disallow subdirectory but allow directory

I want to allow crawling of files in: /directory/ but not crawling of files in: /directory/subdirectory/ Is the correct robots.txt instruction: User-agent: * Disallow: /subdirectory/ I'm afraid that if I disallowed /directory/subdirectory/ that I…

robots.txt

asked Mar 22 '11 at 01:04

user523521

votes

2 answers

Twitter meta image is not rendering on Twitter because it "may be restricted by the site's robots.txt file"

So this is the link while I tried using Twitter the image somehow doesn't work, while it works for Facebook. It is working for Facebook only but for Twitter I am getting issue: WARN: The image URL…

image meta-tags robots.txt twitter-card

asked Apr 03 '18 at 16:20

ujwal dhakal

2,289
2
30
50

votes

1 answer

Robots.txt: Is this wildcard rule valid?

Simple question. I want to add: Disallow */*details-print/ Basically, blocking rules in the form of /foo/bar/dynamic-details-print --- foo and bar in this example can also be totally dynamic. I thought this would be simple, but then on…

seo robots.txt

asked Jan 28 '11 at 21:53

Bartek

15,269
2
58
65

votes

1 answer

Stop google indexing subdomain

I have subdomain "klient" for testing websites for our clients and I don't want that to be indexed. I have set in robots.txt (in root of our web) this: User-agent: * disallow: /subdom/klient/* But I'm not sure, if it does really work, because I…

robots.txt google-index

asked May 26 '17 at 12:00

stepik21

2,610
3
22
32

votes

5 answers

BOT/Spider Trap Ideas

I have a client whose domain seems to be getting hit pretty hard by what appears to be a DDoS. In the logs it's normal looking user agents with random IPs but they're flipping through pages too fast to be human. They also don't appear to be…

php web-crawler bots robots.txt zombie-process

asked Sep 29 '10 at 20:36

Mikey1980

votes

3 answers

Best practice to create robots.txt file inside my asp.net mvc web site

I want to create a robots.txt for my asp.net mvc-5 web site, now I find this link which talks about achieving this task:- http://rehansaeed.com/dynamically-generating-robots-txt-using-asp-net-mvc/ where in this link they are creating a separate…

asp.net asp.net-mvc asp.net-mvc-5 robots.txt

asked Jan 17 '16 at 02:28

user1404577

votes

3 answers

robots.txt parser java

I want to know how to parse the robots.txt in java. Is there already any code?

java parsing robots.txt

asked Jun 29 '10 at 13:24

zahir hussain

3,711
10
29
36

votes

1 answer

Django - Loading Robots.txt through generic views

I have uploaded robots.txt into my templates directory on my production server. I am using generic views; from django.views.generic import TemplateView (r'^robots\.txt$', TemplateView.as_view(template_name='robots.txt',…

django django-templates django-views robots.txt

asked Jan 03 '15 at 19:59

Uma

votes

2 answers

Azure domain being indexed by google

I have a website that has a domain 'example.azurewebsites.net'. I also have a custom domain configured for it 'www.example.com'. Google is indexing my 'example.azurewebsites.net' website and I want it to stop and only index it has…

azure dns robots.txt

asked Nov 11 '14 at 03:32

user342706

Prev 1 2 3

…

95 96 Next