-1

I try to find how to block crawlers to access my links that are something like this:

site.com/something-search.html

I want to block all /something-*

Can someone help me?

Kara
  • 6,115
  • 16
  • 50
  • 57

2 Answers2

0

In your robots.txt

User-agent: *
Disallow: site.com/something-(1st link)
.
.
.
Disallow: site.com/somedthing-(last link)

Add entry for each page that you don't want to be seen!

Though regex are not allowd in robots.txt some intelligent crawlers can understand it!

have a look here

Nullpointer
  • 1,086
  • 7
  • 20
  • The problem is that i cant know what is the first and last link, this is my search result page and i want to disable crawler to access that search page ... i tried with this: Disallow: /search-* Wildcard characters (like "*") are not allowed here The line below must be an allow, disallow, comment or a blank line statement – user3260531 Feb 01 '14 at 14:02
  • No you can not use * in robots.txt – Nullpointer Feb 01 '14 at 14:06
  • `Disallow` must not contain the host (`site.com` in your case) of the URLs. – unor Feb 01 '14 at 14:08
0
User-agent: *
Disallow: /something-

This blocks all URLs whose path starts with /something-, for example for a robots.txt accessible from http://example.com/robots.txt:

  • http://example.com/something-
  • http://example.com/something-foo
  • http://example.com/something-foo.html
  • http://example.com/something-foo/bar

The following URLs would still be allowed:

  • http://example.com/something
  • http://example.com/something.html
  • http://example.com/something/
unor
  • 92,415
  • 26
  • 211
  • 360