In Meteor, I have installed the spiderable
package, which allows the application to be crawled by search engines. However, I want to exclude certain paths from being crawled.
For example, example.com/abc/[path]
should not be crawled, whereas example.com/[path]
should be.
I am unsure of how to do this. One guess is to include a robots.txt
in the /public
directory, and use regex as described here. However, the url doesn't contain the #!
as it did in this question. Is that relevant?
My current implementation is a bit more complicated, and it's based on the following quote from the package's README.md
:
In order to have links between multiple pages on a site visible to spiders, apps must use real links (eg ) rather than simply re-rendering portions of the page when an element is clicked.
At the moment, when the page is rendered, I test whether there's a /abc
in the root of the path, and then set a persistent session variable. This allows me to make all paths in my pages' links not contain the /abc
prefix. When a link is clicked, it will check whether the session variable is set and append to the path in an onBeforeAction()
function, which allows the right template to be rendered. In doing so, I am hoping those links won't be visible to the spider, but I am unsure of the reliability of such a method.
tl;dr - How to exclude certain paths from being crawled in Meteor?