Disallow certain page directories but NOT that page itself

Question

Let's say, I have a dynamic page that creates URL's from user inputs. For example: www.XXXXXXX.com/browse <-------- (Browse being the page)

Every time user enters some query, it generates more pages. For example: www.XXXXXXX.com/browse/abcd <-------- (abcd being the new page)

Now, I want Google to do crawl this "browse" page but not the sub pages generated by it.

I'm thinking of adding this to my robots.txt page; "Disallow: /browse/"

Would that be the right thing to do? or will it also prevent the Googlebot from crawling the "browse" page? What should I do to get the optimal result?

I believe you can use the meta tag 'no index, no follow' on your inner pages, and just use 'no follow' on your main pages, so google will not 'go in' more...http://www.robotstxt.org/meta.html — Ziv Weissman, Dec 25 '15 at 20:57
These sub pages are like in hundreds, I don't think I can add no follow to every page manually :/ — Raj Sandhu, Dec 25 '15 at 20:59

score 4 · Accepted Answer · answered Dec 25 '15 at 22:40

4

URL doesn't end with slash:

www.XXXXXXX.com/browse

Therefore this code should work:

User-agent: *
Disallow: /browse/

answered Dec 25 '15 at 22:40

znurgl

1,077
7
12

That's what I've been thinking. I have a doubt though, Are you sure that it will not block the page "browse" itself? – Raj Sandhu Dec 25 '15 at 23:19
Yes, I'm sure. Because disallow contains a trailing slash. – znurgl Dec 25 '15 at 23:22
Well thanks a lot. Just for the knowledge, so if I had to disallow only the browse, I would need not to put "/" at the end and that's it? – Raj Sandhu Dec 25 '15 at 23:25
@RajSandhu: In your question, the example URL *does* end with a slash (`www.XXXXXXX.com/browse/`). Is this intentional? If yes, the robots.txt suggested by znurgl would of course also block it. – unor Dec 26 '15 at 02:41
@unor No, that was not intentional. I'm confused now :/ What am I ought to do – Raj Sandhu Dec 26 '15 at 08:07
1

@RajSandhu: Well, you removed the trailing slash now from your question. If your actual URL really is `example.com/browse` (without trailing slash), then znurgl’s robots.txt will work. – unor Dec 26 '15 at 13:10

Disallow certain page directories but NOT that page itself

1 Answers1