1

I have an AngularJS app on an Apache webserver that I would like to have indexed by search engines (i.e. Google/Bing bots etc.). I have a PhantomJS script to crawl and take snapshots of pages on my site, and I have followed the instructions from Google on how to redirect any http://mysite.com/?_escaped_fragment_=* requests to the appropriate pages.

The problem I'm facing is that I have a few routes in the app that change content based on the anchor, e.g. http://mysite.com/#!/about is different from http://mysite.com/#!/about#overview. I would like these changes to be indexed, but the hash character '#' is used for commenting and even escaping it with a backslash doesn't work. I have consulted other SO answers (e.g. Apache rewrite condition for ajax crawling and mod_rewrite page anchor), but I have not found instructions on how to deal with anchors.

I have two questions.

  1. Is there a way to redirect URLs using mod_rewrite to snapshots that include anchors? For example, using the escaped version of '#' ('%23'):

    http://mysite.com/?_escaped_fragment_=about%23overview => http://mysite.com/snapshots/about#overview.html
    

    Here's what I currently have in my .htaccess file, though it does not work for pages with anchors:

    RewriteEngine On                                                                
    Options +FollowSymLinks                                                         
    
    # Route for the index page
    RewriteCond %{QUERY_STRING} ^_escaped_fragment_=/$                              
    RewriteRule ^(.*)$ snapshots/index.html [NC,L]  
    
    # All other routes                                
    RewriteCond %{QUERY_STRING} ^_escaped_fragment_=/?(.*)$                         
    RewriteRule ^(.*)$ snapshots/%1.html [NC,L]                                     
    
  2. If (1) is not allowed, my idea on how to solve this problem is replace all '#' with '.' in the file names of the snapshots. Then I would need a mod_rewrite rule that would replace '#' with '.' in the escaped_fragment query parameter. Going back to my example, I currently have a rule that would take /?_escaped_fragment_=about#overview and reroute it to /snapshots/about.overview.html.

    RewriteCond %{QUERY_STRING} ^_escaped_fragment_=/about%23overview$              
    RewriteRule ^(.*)$ snapshots/about.overview.html [NE,NC,L]                      
    

    Is there a simple general rule I could use to implement this type of routing?

Any other ideas for how to solve this problem with general rewrite conditions would be appreciated, thanks!

Community
  • 1
  • 1
mdml
  • 22,442
  • 8
  • 58
  • 66

1 Answers1

1

I believe following rule should work for you:

RewriteCond %{QUERY_STRING} ^_escaped_fragment_=([^&]+) [NC]
RewriteRule ^$ /snapshots/%1.html? [R,NE,L]   

It redirects /?_escaped_fragment_=about%23overview to /snapshots/about%23overview.html

anubhava
  • 761,203
  • 64
  • 569
  • 643
  • I tried this on my browser and it appended the absolute path to the snapshots directory into the request: e.g. http://mysite.com/var/www/html/snapshots/about%23overview.html. If I remove the redirect R flag, though, it works. I then tried "Fetching as Google," and got a 404 not found. If I put the redirect back in, it works for Google though. Any ideas? – mdml Oct 04 '13 at 18:15
  • When you say `I then tried "Fetching as Google` what was your original URL that you tried? – anubhava Oct 04 '13 at 18:18
  • I thought original URL is something like `http://mysite.com/?_escaped_fragment_=...` – anubhava Oct 04 '13 at 18:21
  • Ah yes, but the [Fetch as Google](https://support.google.com/webmasters/answer/158587?hl=en) tool let's you instruct a Google bot to visit your page as if it was indexing it. Thus the Google bot will transform the request into `http://mysite.com/?_escaped_fragment_=...`. The 404 not found error says `/snapshots//about%23overview.html` was not found on the server. – mdml Oct 04 '13 at 18:25
  • let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/38635/discussion-between-mtitan8-and-anubhava) – mdml Oct 04 '13 at 18:33
  • Fetch as Google does NOT work as the actual crawler does for reading fragments of Angular apps. https://productforums.google.com/forum/#!topic/webmasters/2Gs_eGQVw_k – Du3 Oct 16 '14 at 19:07