robots.txt blocking crawlers from accesing page

Question

I try to find how to block crawlers to access my links that are something like this:

site.com/something-search.html

I want to block all /something-*

Can someone help me?

This question appears to be off-topic because it is about SEO — John Conde, Feb 01 '14 at 23:28

score 0 · Answer 1 · answered Feb 01 '14 at 13:55

0

In your robots.txt

User-agent: *
Disallow: site.com/something-(1st link)
.
.
.
Disallow: site.com/somedthing-(last link)

Add entry for each page that you don't want to be seen!

Though regex are not allowd in robots.txt some intelligent crawlers can understand it!

have a look here

answered Feb 01 '14 at 13:55

Nullpointer

The problem is that i cant know what is the first and last link, this is my search result page and i want to disable crawler to access that search page ... i tried with this: Disallow: /search-* Wildcard characters (like "*") are not allowed here The line below must be an allow, disallow, comment or a blank line statement – user3260531 Feb 01 '14 at 14:02
No you can not use * in robots.txt – Nullpointer Feb 01 '14 at 14:06
`Disallow` must not contain the host (`site.com` in your case) of the URLs. – unor Feb 01 '14 at 14:08

score 0 · Accepted Answer · answered Feb 01 '14 at 14:06

User-agent: *
Disallow: /something-

This blocks all URLs whose path starts with /something-, for example for a robots.txt accessible from http://example.com/robots.txt:

The following URLs would still be allowed:

2 Answers2