Questions tagged [robots.txt]

Robots.txt (the Robots Exclusion Protocol) is a text file placed in the root of a web site domain to give instructions to compliant web robots (such as search engine crawlers) about what pages to crawl and not crawl, as well as other information such as a Sitemap location. In modern frameworks it can be useful to programmatically generate the file. General questions about Search Engine Optimization are more appropriate on the Webmasters StackExchange site.

Website owners use the /robots.txt file to give instructions about their site to web robots; this is called The Robots Exclusion Protocol.

It works likes this: a robot wants to vists a website URL, say http://www.example.com/welcome.html. Before it does so, it firsts checks for http://www.example.com/robots.txt, and finds:

User-agent: *
Disallow: /

The "User-agent: *" means this section applies to all robots. The "Disallow: /" tells the robot that it should not visit any pages on the site.

There are two important considerations when using /robots.txt:

  • robots can ignore your /robots.txt. Especially malware robots that scan the web for security vulnerabilities, and email address harvesters used by spammers will pay no attention.
  • the /robots.txt file is a publicly available file. Anyone can see what sections of your server you don't want robots to use, so don't try to use /robots.txt to hide information.

More information can be found at http://www.robotstxt.org/.

1426 questions
-1
votes
1 answer

How to disallow specific pages in robots.txt, but allow everything else?

Is this the way to do it? User-agent: * Allow: / Disallow: /a/* I have pages like: mydomaink.com/a/123/group/4 mydomaink.com/a/xyz/network/google/group/1 I don't want to allow them to appear on Google.
TIMEX
  • 259,804
  • 351
  • 777
  • 1,080
-1
votes
2 answers

robots.txt blocking crawlers from accesing page

I try to find how to block crawlers to access my links that are something like this: site.com/something-search.html I want to block all /something-* Can someone help me?
-1
votes
1 answer

Search results on google image search shows slider images

Is there any way to disallow slider images on google search results ? I heard that making robots.txt will fix this problem. User-agent: Googlebot-Image Disallow: /images/abc.jpg : this is for a perticular image. and I want to disallow a whole…
-1
votes
1 answer

Googlebot guesses urls. How to avoid/handle this crawling

Googlebot is crawling our site. Based on our URL structure it is guessing new possible URLs. Our structure is of the kind /x/y/z/param1.value. Now google bot exchanges the values of x,y,z and value with tons of different keywords. Problem is, that…
Zensursula
  • 173
  • 4
  • 9
-1
votes
1 answer

Disallow rule in the robots file

Disallow: /*? For a website which has this in the robots.txt file. I am presuming all will be blocked before the ?. Is this true? All levels/folder before the /? ?
Arjan
  • 1
-1
votes
2 answers

can i use robots.txt while handling my site with htaccess

I am using htaccess in my site, such that all the request to my site will be redirected to index page in my root directory. No other file in my site can be accessed because my htaccess will restrict it. My doubt is, when I use robots.txt file, will…
Goysar
  • 529
  • 7
  • 25
-1
votes
1 answer

robots.txt block bots crawling subdirectory

I'd like to block all bots from crawling a sub directory http://www.mysite.com/admin plus any files and folders in that directory. For example there may be further directories inside /admin such as http://www.mysite.com/admin/assets/img I'm not…
CaribouCode
  • 13,998
  • 28
  • 102
  • 174
-1
votes
1 answer

robot.txt syntax not understood

I am getting this error on the first line of robot.txt User-agent: * my robot.txt is as follows: User-agent: * Disallow: /Search/ Disallow: /_layouts/ Disallow: /blog/_layouts/ Disallow: /Blog/_layouts/ Disallow: /ReusableContent/ Disallow:…
Preetam
  • 618
  • 6
  • 13
-1
votes
2 answers

Impact of blocking subdirectory via robots.txt that belong to another website

I have hosting setup on hostgator which creates addon domains inside a subdirectory of the main site. So the structure is public_html - main_site - main_site_folders - addon_site1 - addon_site2 and so on. If I disallow…
mehulved
  • 371
  • 2
  • 14
-1
votes
2 answers

How to Prevent spiders/search engines from following the 'report as offensive content' link

I have a rails application. In my comments section I have report as offensive content' link on some (article) pages of the site. I want to get crowed the articles by search engines but not particular link report as offensive content. So , In…
Manish Shrivastava
  • 30,617
  • 13
  • 97
  • 101
-1
votes
2 answers

making robots.txt

I am making a robot.txt for my website. Can anybody cnmform that am i doing it correctly? If I am wrong, please tell me how to write in the correct form. admincp, adminpp etc are folder in my hosting server: User-agent: * Disallow:…
Zohaib Baig
  • 21
  • 1
  • 6
-1
votes
3 answers

Prevent Google Indexing my Pagination System

I am wondering if there is a way to include in my robots.txt a line which stops Google from indexing any URL in my website, that contains specific text. I have different sections, all of which contain different pages. I don't want Google to index…
Cristian
  • 6,765
  • 7
  • 43
  • 64
-1
votes
2 answers

How to block this kind of URLs in robot.txt file?

here are two set of conditions and I want to block all url with /search/jobdescription? and search/jobdescription/ 1) http://<--sitename-->/search/jobdescription?id=0154613&css=a&act=a 2)…
Manojkumar
  • 1,351
  • 5
  • 35
  • 63
-1
votes
1 answer

Robots.txt / How can I hide a single HTML tag from search engines?

I'm building a one-page portfolio that has a contact form/section at the bottom of it. Next to my form I have listed most services I can provide. This however creates to many key words of the same kind and thus I started to wonder how I could hide…
Marc Wiest
  • 349
  • 3
  • 9
-2
votes
2 answers

Add robots.txt for Laravel 9+

I want to add robots.txt to my Laravel project but robots.txt packages I found are not compatible with Laravel 9+ so if you know there is any tutorial or package for latest version of Laravel, please share. Thanks.
Leslie Joe
  • 281
  • 4
  • 17