0

As noted elsewhere, when Google crawls a Wordpress site with Disqus enabled, Google attempts to index some links that are dynamically generated by Disqus (these links do not appear in the page source, so I presume these are Javascript-based links).

So for example, Google Webmaster Tools attempts to crawl URL (A) below and throws a Page Not Found error because the correct URL (B) has been modified by Disqus:

(A) www.example.com/blog/2012/09/blog-post-title/2147423647/1346789815000

(B) www.example.com/blog/2012/09/blog-post-title/

The Dynamic URL created by Disqus always includes the "2147423647" component; these digits do not change. The "1346789815000" portion may or may not exist, and the digits change from page-to-page.

I'd like to use mod-rewrite so that attempts to access URLs of the form (A) get a 301 redirect to (B) so that I stop getting crawl errors.

Please advise. Note that I'm a mod-rewrite noob so any and all help is appreciated! Thanks in advance.

1 Answers1

0

Using mod_rewrite, try:

RewriteEngine n
RewriteRule ^([0-9]{4})/([0-9]{2})/([^/]+)/[0-9]+ /$1/$2/$3/ [L,R=301]

Make sure these are above any rules that you may have for processing the SEO friendly URLs.

Jon Lin
  • 142,182
  • 29
  • 220
  • 220
  • Thanks, Jon. This RewriteRule seems to work on URLs of the format: www.example.com/blog/2012/09/blog-post-title/2147423647 but not if you add another slash and another group of digits. How would I modify this? Thanks again. – user1697738 Sep 25 '12 at 19:41
  • I found a fix. First, I had to put the {4} inside the first set of parentheses. Then I had to add /blog before the /$1. It seems to be working as intended now. Thanks again. – user1697738 Sep 25 '12 at 21:27