How to configure robots.txt to allow everything?

Question

My robots.txt in Google Webmaster Tools shows the following values:

User-agent: *
Allow: /

What does it mean? I don't have enough knowledge about it, so looking for your help. I want to allow all robots to crawl my website, is this the right configuration?

Allow is not understood by all web crawlers, use disavow: (i.e., with no url after the : ) instead. It is safer (see: https://youtu.be/G29Zt-UH_Ko) — Jérôme Verstrynge, Sep 09 '15 at 18:56
See also on Webmasters: [What is a minimum valid robots.txt file?](https://webmasters.stackexchange.com/questions/56720/what-is-a-minimum-valid-robots-txt-file) — Stephen Ostermiller, Nov 08 '21 at 14:13

score 196 · Answer 1 · answered Nov 25 '10 at 12:23

196

That file will allow all crawlers access

User-agent: *
Allow: /

This basically allows all user agents (the *) to all parts of the site (the /).

answered Nov 25 '10 at 12:23

Jim

22,354
6
52
80

14

Correct, unless you need to negate the allow part. There is not "allow" so make that: "User-agent: * Disallow:" like they show here: http://www.robotstxt.org/robotstxt.html – Misbit Jan 08 '15 at 13:46
1

There is an allow part. Check official Google docs https://developers.google.com/search/reference/robots_txt#allow – Hasan Sefa Ozalp Jul 29 '20 at 06:24
2

I'm downvoting this answer because `Allow:` is a non-standard addition to the robots.txt. The original standard only has `Disallow:` directives. This answer will work for Googlebot and some other search engines, but it isn't universal. The universal way is to disallow nothing as stated in [Unor's answer](https://stackoverflow.com/a/44467157/). – Stephen Ostermiller Nov 08 '21 at 14:06

score 116 · Answer 2 · answered Jun 09 '17 at 21:48

If you want to allow every bot to crawl everything, this is the best way to specify it in your robots.txt:

User-agent: *
Disallow:

Note that the Disallow field has an empty value, which means according to the specification:

Any empty value, indicates that all URLs can be retrieved.

Your way (with Allow: / instead of Disallow:) works, too, but Allow is not part of the original robots.txt specification, so it’s not supported by all bots (many popular ones support it, though, like the Googlebot). That said, unrecognized fields have to be ignored, and for bots that don’t recognize Allow, the result would be the same in this case anyway: if nothing is forbidden to be crawled (with Disallow), everything is allowed to be crawled.
However, formally (per the original spec) it’s an invalid record, because at least one Disallow field is required:

At least one Disallow field needs to be present in a record.

Raja Anbazhagan · Answer 3 · 2019-10-13T15:45:24.897

28

I understand that this is fairly old question and has some pretty good answers. But, here is my two cents for the sake of completeness.

As per the official documentation, there are four ways, you can allow complete access for robots to access your site.

Clean:

Specify a global matcher with a disallow segment as mentioned by @unor. So your /robots.txt looks like this.

User-agent: *
Disallow:

The hack:

Create a /robots.txt file with no content in it. Which will default to allow all for all type of Bots.

I don't care way:

Do not create a /robots.txt altogether. Which should yield the exact same results as the above two.

The ugly:

From the robots documentation for meta tags, You can use the following meta tag on all your pages on your site to let the Bots know that these pages are not supposed to be indexed.

<META NAME="ROBOTS" CONTENT="NOINDEX">

In order for this to be applied to your entire site, You will have to add this meta tag for all of your pages. And this tag should strictly be placed under your HEAD tag of the page. More about this meta tag here.

edited Oct 13 '19 at 15:45

answered Dec 25 '17 at 06:58

Raja Anbazhagan

4,092
1
44
64

No robots.txt and Wordpress is a bad combo though, because WordPress generates a virtual robots.txt. Unless you're happy with the one WordPress generates. – Jesper Oct 17 '19 at 21:03
What about TYPO3? It has sitemap by default? – Mithun Jack Oct 02 '20 at 17:37
How is that I didn't create any robots.txt file and still get "Indexed, but blocked by a robots.txt file"? – Román Feb 15 '21 at 17:57
do you have the website URL?. – Raja Anbazhagan Feb 16 '21 at 12:56

score 9 · Answer 4 · answered Nov 25 '10 at 12:24

9

It means you allow every (*) user-agent/crawler to access the root (/) of your site. You're okay.

answered Nov 25 '10 at 12:24

Jordi

5,846
10
40
41

5

there is no "Allow" field, according to http://www.robotstxt.org/robotstxt.html so I'd be careful to use that. Wikipedia mentions "Some major crawlers support an Allow directive which can counteract a following Disallow directive.": http://en.wikipedia.org/wiki/Robots_exclusion_standard#Allow_directive – Mackaaij Dec 04 '14 at 20:17

score -3 · Answer 5 · answered Nov 08 '21 at 11:33

-3

I think you are good, you're allowing all pages to crawling User-agent: * allow:/

answered Nov 08 '21 at 11:33

Asma Alfauri

83
1
7

2

This answer says nothing that other answers haven't already addressed. – Stephen Ostermiller Nov 08 '21 at 14:08

How to configure robots.txt to allow everything?

5 Answers5

Clean:

The hack:

I don't care way:

The ugly:

Linked