Meta tag vs robots.txt

Question

Is it better to use meta tags* or the robots.txt file for informing spiders/crawlers to include or exclude a page?
Are there any issues in using both the meta tags and the robots.txt?

*Eg: <#META name="robots" content="index, follow">

This is a programming related question in terms of web development. — Tom, Aug 08 '10 at 23:42
It is preferred if you can post separate questions instead of combining your questions into one. That way, it helps the people answering your question and also others hunting for at least one of your questions. Thanks! — Hille, Mar 04 '19 at 13:51

score 52 · Answer 1 · edited Feb 11 '19 at 20:52

52

There is one significant difference. According to Google they will still index a page behind a robots.txt DENY, if the page is linked to via another site.

However, they will not if they see a metatag:

While Google won't crawl or index the content blocked by robots.txt, we might still find and index a disallowed URL from other places on the web. As a result, the URL address and, potentially, other publicly available information such as anchor text in links to the site can still appear in Google search results. You can stop your URL from appearing in Google Search results completely by using other URL blocking methods, such as password-protecting the files on your server or using the noindex meta tag or response header.

edited Feb 11 '19 at 20:52

Andy

4,783
2
26
51

answered Aug 19 '13 at 14:27

user2696762

521
4
3

7

And according to these [1](http://evolvedigitallabs.com/blog/robots-txt-vs-noindex-differences), [2](http://etechdiary.com/robots-txt-vs-noindex-deindex-your-site-the-right-way), [3](http://moz.com/learn/seo/robotstxt) pages, it's not just google. In general, the meta tag is used to disallow indexing, whereas robots.txt is used to disallow crawling. – zrisher Aug 01 '14 at 23:07
1

+1, and I took the liberty to update your post with a quote from the linked page, should its contents change! – BenMorel Mar 10 '17 at 13:59
1

@zrisher If I follow you correctly, it is the case that – **if we want no crawling or indexing of a page** – then we should **_both_ block the page on robots.txt and use the `noindex, nofollow` tag in the page's meta tags.** Sorry for the seeming redundant question here, some of the citation links you have provided are now dead. – Parapluie Dec 27 '19 at 17:07
2

@Parapluie no worries at all, in fact it is an excellent question and the answer is non-obvious. On [this page](https://support.google.com/webmasters/answer/93710) google tells us that if we follow your suggestion of blocking it with both robots.txt and meta tags, if it was already indexed your changed meta tag would be *ignored* and thus it would remain indexed, because google was not allowed to crawl the page to see the new tag! So the answer is **always provide the meta tag**. You can provide the robots entry (to reduce requests) once you know it's been dropped, or if it was never indexed. – zrisher Dec 27 '19 at 21:19
1

@zrisher I've been looking at this subject off and on for some months, and this is by far the clearest and most concise answer. You've set forth the simple logic which I had mistaken for a Google paradox! Sincere thanks for this clarification. – Parapluie Dec 29 '19 at 21:23

CJM · Accepted Answer · 2010-07-27T21:59:51.087

4

Robots.txt IMHO.

The Meta tag option tells bots not to index individual files, whereas Robots.txt can be used to restrict access to entire directories.

Sure, use a Meta tag if you have the odd page in indexed folders that you want skipping, but generally, I'd recommend you most of your non-indexed content in one or more folders and use robots.txt to skip the lot.

No, there isn't a problem in using both - if there is a clash, in general terms, a deny will overrule an allow.

edited Jul 27 '10 at 21:59

answered Jul 27 '10 at 21:49

CJM

11,908
20
77
115

1

Although I tend to go for Robots.txt myself as well, is it not possible that dodgy robots could just use that file to get a convenient list of new directories it can spider? Whereas with the META tag, they'd have no way of finding a non-linked page in the first place... Just a thought! – Codecraft Mar 30 '11 at 09:30
1

@Codecraft That may be true, but that is way you should not display sensitive information to unauthorized users. `robots.txt` is used to instruct crawlers what information is not worth while rather than what is private and must not be accessed. – Uyghur Lives Matter Feb 20 '15 at 17:11
I recommend all visitors to this page scroll down and check the next answer via @Benjamin, as it links to Google's documentation! [https://stackoverflow.com/a/18316292/1079503](https://stackoverflow.com/a/18316292/1079503) – Alex W Jan 24 '19 at 18:46

score 4 · Answer 3 · answered Jul 27 '10 at 21:50

Both are supported by all crawlers which respect webmasters wishes. Not all do, but against them neither technique is sufficient.

You can use robots.txt rules for general things, like disallow whole sections of your site. If you say Disallow: /family then all links starting with /family are not indexed by a crawler.

Meta tag can be used to disallow a single page. Pages disallowed by meta tags do not affect sub pages in the page hierarchy. If you have meta disallow tag on /work, it does not prevent a crawler from accessing /work/my-publications if there is a link to it on an allowed page.

score 1 · Answer 4 · answered Feb 15 '14 at 16:57

1

meta is superior.

In order to exclude individual pages from search engine indices, the noindex meta tag is actually superior to robots.txt.

answered Feb 15 '14 at 16:57

user2513846

1,151
2
16
39

score 1 · Answer 5 · answered Jul 18 '14 at 12:23

There is a very huge difference between meta robot and robots.txt.

In robots.txt, we ask crawlers which page you have to crawl and which one you have to exclude but we don't ask crawler to not to index those excluded pages from crawling.

But if we use meta robots tag, we can ask search engine crawlers not to index this page.The tag to be used for this is:

<#meta name = "robot name", content = "noindex"> (remove #)

OR

<#meta name = "robot name", content = "follow, noindex"> (remove #)

In the second meta tag, I have asked robot to follow that URL but not to index in search engine.

score 1 · Answer 6 · edited Mar 04 '19 at 20:16

Here is my knowledge about them. I am talking about their work area. Both we can use for blocking content.

The difference between both is:

Meta Robot can block a single page with some piece of the code paste in the header of the website. By using the meta robot tag we tell the search engine for which function we are using meta tag.
In Robots.txt file you can block the whole website.

Here is the example of meta robot:

<meta name="robots" content="index, follow"> 
<meta name="robots" CONTENT="all">
<meta name="robots" content="noindex, follow">
<meta name="robots" content="noindex, nofollow">
<meta name="robots" content="index, nofollow" />
<meta name="robots" content="noindex, nofollow" />

Here is the example of Robots.txt file:

Allowing crawlers to crawl all website

user-agent: *
Allow:
Disallow:

Disallowing crawlers to crawl all website

user-agent: *
Allow:
Disallow:/

score 0 · Answer 7 · edited Jan 23 '14 at 17:27

0

Robots.txt is good for pages which consume a lot of your crawling budget like internal search or filters with infinite combination. If you allow Google to index yoursite.com/search=lalalala it will waste you crawling budget.

edited Jan 23 '14 at 17:27

takendarkk

3,347
8
25
37

answered Jan 23 '14 at 17:03

Mathilde Joly

19

You still can disallow that using meta-tags, right? But the question was what is the difference between this approach and robots.txt. – FazoM Jan 23 '14 at 17:25
I don't think it is the same. If your rules are in robots.txt a crawler would just have to periodically load robots.txt in order to have an up-to-date view of what it's allowed to crawl. If your rules are in meta tags it would have to load every tagged page periodically to have an up-to-date view of the rules. – Keith Nov 03 '15 at 12:38

score 0 · Answer 8 · answered Aug 12 '14 at 18:31

0

You want to use 'noindex,follow' in a robots meta tag, rather than robots.txt, because it will allow the link juice to pass through. It is better from a SEO perspective.

answered Aug 12 '14 at 18:31

Jérôme Verstrynge

57,710
92
283
453

score 0 · Answer 9 · answered Jul 27 '10 at 21:42

I would probably use robots.txt over the meta tag. Robots.txt has been around longer, and might be more widely supported (But I am not 100% sure on that).

As for the second part, I think most spiders will take whatever is the most restrictive setting for a page - if there is a disparity between the robots.txt and meta tag.

score 0 · Answer 10 · answered Jul 23 '19 at 11:07

Is it better to use meta tags* or the robots.txt file for informing spiders/crawlers to include or exclude a page?

Answer: Both are important to use, they are used for different purposes. Robots file is used to include or exclude pages or root files from spider's index. While, Meta tags are used analyse a website page that defines about it's niche & content within the page.

Are there any issues in using both the meta tags and the robots.txt?

Answer: Both should be implemented to sites so that search engine spiders/crawlers can index or de-index the site urls.

Read more here about working of a search engine spiders >>https://www.playbuzz.com/alexhuber10/how-search-and-spider-engines-work

score -1 · Answer 11 · answered Aug 20 '13 at 07:20

-1

You can have any one but if your website has plenty of web pages then robots.txt is easy and reduces time complexity

answered Aug 20 '13 at 07:20

James Andreson

46

Meta tag vs robots.txt

11 Answers11

The difference between both is:

Here is the example of meta robot:

Here is the example of Robots.txt file: