0

I would like to disallow some URLs in robots file of my website and have some difficulties.

Right now my robots file has the following content:

User-agent: *

Allow: /
Disallow: /cgi-bin/

Sitemap: http://seriesgate.tv/sitemap.xml

I do not want Google to index the following URLs:

http://seriesgate.tv/watch-breakingbad-online/season5/episode8/searchresult/

There are 8000 more URLs like this. So a code in robots file that block all this.

AND also I want to disallow search box from robots file so that search pages are not crawled by Google for example this URL:

seriesgate.tv/search/indv_episodes/friends/

Any ideas?

unor
  • 92,415
  • 26
  • 211
  • 360
alikarimi
  • 1
  • 1

2 Answers2

0

Add Disallow: /name_of_folder/ to not allow google to crawl into a folder and add Disallow: /file_name to not allow google to crawl a specific file..

Niket Malik
  • 1,075
  • 1
  • 14
  • 23
  • So what should i write in robots file because my site has 400 shows so i have to write something that cover all shows not just the breaking bad or the episode mentioned above. (All shows and their episodes) – alikarimi Jul 15 '13 at 17:34
  • The best way is put the shows in a folder named 'shows' the simply add `Disallow: /shows/` but this will result in changing the links in the codes, which can be done using any code editor(using find and replace option) – Niket Malik Jul 15 '13 at 17:56
0

First, your robots.txt (as included in your question) is invalid. There must not be a line break after the User-agent line.

Second, you don’t need the Allow line, as everything is allowed which is not explicitly blocked anyway.


If all your 8000 URLs you want to block start with "watch-", you could use:

Disallow: /watch-

For blocking the search results, you could use:

Disallow: /search/

Note that you have to check if no others pages, which you don’t want to block, are matched by these Disallow values.

So your robots.txt could look like:

User-agent: *
Disallow: /cgi-bin/
Disallow: /watch-
Disallow: /search/

Sitemap: http://seriesgate.tv/sitemap.xml

It would block URLs like:

  • http://seriesgate.tv/watch-
  • http://seriesgate.tv/watch-me
  • http://seriesgate.tv/watch-me.html
  • http://seriesgate.tv/watch-/
  • http://seriesgate.tv/search/
  • http://seriesgate.tv/search/some-result
  • http://seriesgate.tv/search/in-titles/foobar.html
unor
  • 92,415
  • 26
  • 211
  • 360