robots.txt handling a # in a URL

Question

Given the following URLs:

example.com/products
example.com/products#/page-2
example.com/products#/page-3
...

By using the robots.txt file, the first URL (example.com/products) is supposed to be indexed, every other one should be blocked from being indexed. How can this be done?

None of the following attempts work in the desired manner:

Noindex: /products#/page-*
Noindex: /products\#/page-*
Noindex: /*/page-*
Noindex: /*#/page-*
Noindex: /*\#/page-*

I'm voting to close this question as off-topic because this is about SEO, not programming. Try [sf] or [webmasters.se]. — Barmar, Dec 07 '18 at 21:15

score 1 · Answer 1 · edited Dec 07 '18 at 21:17

1

/products#/page is not a unique page. The actual url is simply /products.

# is abused to hook into javascript frameworks that dynamically load other pages, but, but normally /products#/page means that your /products page has an element such as this <a name="#/page">, and you can't block specific elements.

SPA's break the web. You're better off creating real, independent pages.

edited Dec 07 '18 at 21:17

Barmar

741,623
53
500
612

answered Dec 07 '18 at 21:16

Evert

93,428
18
118
189

score 0 · Answer 2 · answered Dec 07 '18 at 23:14

Everything after # is called "anchor". This information is NOT transferred to the server, hence you cannot read it from PHP, or any other language that is executed on the serverside.

As @Evert Outlines, the "anchor-tag" is commonly abused with javascript, as it can be modified WITHOUT the need of an actual redirect, allowing to generate deep-links, for dynamic content. (They are working, cause a client-side javascript will take care to use AJAX to dynamically load content based on the anchor-tag)

robots.txt handling a # in a URL

2 Answers2