0

Given the following URLs:

  • example.com/products
  • example.com/products#/page-2
  • example.com/products#/page-3
  • ...

By using the robots.txt file, the first URL (example.com/products) is supposed to be indexed, every other one should be blocked from being indexed. How can this be done?

None of the following attempts work in the desired manner:

  • Noindex: /products#/page-*
  • Noindex: /products\#/page-*
  • Noindex: /*/page-*
  • Noindex: /*#/page-*
  • Noindex: /*\#/page-*
Marco
  • 1
  • 1
  • 1
    I'm voting to close this question as off-topic because this is about SEO, not programming. Try [sf] or [webmasters.se]. – Barmar Dec 07 '18 at 21:15

2 Answers2

1

/products#/page is not a unique page. The actual url is simply /products.

# is abused to hook into javascript frameworks that dynamically load other pages, but, but normally /products#/page means that your /products page has an element such as this <a name="#/page">, and you can't block specific elements.

SPA's break the web. You're better off creating real, independent pages.

Barmar
  • 741,623
  • 53
  • 500
  • 612
Evert
  • 93,428
  • 18
  • 118
  • 189
0

Everything after # is called "anchor". This information is NOT transferred to the server, hence you cannot read it from PHP, or any other language that is executed on the serverside.

As @Evert Outlines, the "anchor-tag" is commonly abused with javascript, as it can be modified WITHOUT the need of an actual redirect, allowing to generate deep-links, for dynamic content. (They are working, cause a client-side javascript will take care to use AJAX to dynamically load content based on the anchor-tag)

dognose
  • 20,360
  • 9
  • 61
  • 107