First and foremost: which crawlers are trying to access those paths? Are they popular (e.g. Google Bot, Bing Bot, Yahoo! Slurp) or some other bots? Your best bet is to identify which crawlers are the "offenders" and then try to figure out why they're following those links. It's very difficult to tell you how to prevent this without making a bunch of assumptions.
Read on to see just how many assumptions can be made:
Suppose that there are two types of crawlers out there:
- Smart ones: they don't look for URLs in JavaScript, because it's very inefficient and it may result in pointless attempts to crawl things that are complete nonsense (such as
http://link.to.other/javascript/stuff.js
). However, these crawlers may be executing the JavaScript.
- Dumb ones: they may get the HTML content and apply a regex to extract all URLs. Most of the time such crawlers are very likely not even executing your JavaScript.
Having JavaScript execution capability in a crawler is quite complicated, so I would only think that very few crawlers out there have such a capability and if they do, then they're professional grade crawlers. If they're professional grade crawlers, then you may expect that they will most likely support robots.txt as well as things like "nofollow" for an anchor element's rel
attribute:
<a href="http://www.example.com/" rel="nofollow">Link text</a>
I would bucket those in the "smart" crawler group. Most of the popular bots are pretty smart and they're also polite so you don't have to worry about them so much.
Does the JavaScript modify the document which would then result in a hyperlink of some sort? If yes, then a smart crawler can pick up the link, but a dumb crawler won't be able to because they are a lot less likely to execute the JavaScript.
So what can you do then? Well, for smart crawlers you should apply all of the standard politeness policies: robots.txt, "nofollow", etc. Most of the time that should be sufficient to prevent them from crawling those links. You want to be nice to them anyway, since they're probably helpful to your site (i.e. they're going to drive traffic to it based on your content).
For the dumb crawlers you might have to test out a few different solutions: obfuscate the URL or employ one of several strategies to detect them. You can do all kinds of things once you detect them, some are nice, some are not so nice :).
Again, you can see that without further information, we have to make A LOT of assumptions. So you should either provide us more information or at least try to analyze the information yourself and keep the above questions/ideas in mind.