Why does Facebook bot / crawler always hit the root page for my AngularJS SPA?

Question

On a site I'm currently working on, we have the following set up:

Angular JS frontend
ASP.NET MVC Web API backend, on IIS
Prerender.io caching service

With the following rewrite rules in the web.config:

<rule name="AngularJS" stopProcessing="true">
      <match url="(.*)" />
      <conditions logicalGrouping="MatchAll">
        <add input="{REQUEST_FILENAME}" matchType="IsFile" negate="true" />
        <add input="{REQUEST_FILENAME}" matchType="IsDirectory" negate="true" />
        <add input="{URL}" pattern="^/sitemap.xml" negate="true" />
        <add input="{URL}" pattern="^/robots.txt" negate="true" />

        <add input="{QUERY_STRING}" pattern="_escaped_fragment_" negate="true" />
        <add input="{HTTP_USER_AGENT}" pattern="facebook" negate="true" />            
      </conditions>
      <action type="Rewrite" url="/" />
    </rule>

We've got the <meta name="fragment" content="!"> in index.html, AngularJS is running with HTML5 pushstate enabled, and there's a HttpModule on the backend that picks up requests that either have the _escaped_fragment or an appropriate UserAgent and renders Prerender.io content.

With regular search engine crawlers, this works as expected - in prerender.io, we can see the crawlers request the correct url and get served the appropriate cached content.

However, with Facebook, no matter what URL I use to test on https://developers.facebook.com/tools/debug/og/object/ prerender.io is being asked to serve the root page (ie https://example.com/)

For now, I've fixed it by excluding Facebook from the IIS Rewrite with

        <add input="{HTTP_USER_AGENT}" pattern="facebook" negate="true" />

However, I am at a loss as to why Google would be able to hit the correct pages via my current setup, but Facebook does not? I read somewhere that 301 and 302 might be handled differently, is it possibly a case of using rewrites over redirects?

Can you add example urls to your site? Including one with the !# format so that we can test that, and if you correctly respond to the _escaped_fragment_ methods? — Roemer, May 18 '15 at 13:35
we don't normally have the #! urls (as we have HTML5 mode enabled in Angular - $locationProvider.html5Mode(true); ) - they only get used in IE9, where Angular automatically handles it. We definitely respond correctly to escaped_fragment methods (as it works fine for Bing and Google crawlers) - we're using this IIS Module https://github.com/greengerong/Prerender_asp_mvc — Henry C, May 19 '15 at 11:05
I can't really mess around with the production instance of the site (and linking to it wont help since Facebook is currently being excluded from the whole prerender flow), but i might be able to stand up something on a test build later — Henry C, May 19 '15 at 11:18
Do you at least have a sample page where you can reproduce it? Just a simple case that shows the behaviour? Otherwise it's really hard to help you. — Roemer, May 19 '15 at 11:41
I'm trying to replicate the conditions we had three months ago, I did stand up a test site but I'm not able to fully replicate the issue yet (at the moment fetch as google / UA emulation via chrome isnt returning what I'd expect either)...you'll have to bear with until I can replicate I guess. (Obviously after three months of inactivity I didn't expect to hear from anyone so I didn't make much effort to preserve the exact state...) — Henry C, May 19 '15 at 17:07
Alright, I've set up a test case / example. The actual test page / site is - https://cerberus-mvp-bootstrap.azurewebsites.net/profile/supplier - you can see in the source at the top we have the declaration, which causes googlebot, etc. to request https://cerberus-mvp-bootstrap.azurewebsites.net/profile/supplier?_escaped_fragment_= which then goes through prerender.io — Henry C, May 20 '15 at 10:08
on https://developers.facebook.com/tools/debug/og/object/ , everything works as expected - it gets the correct page if you pass it https://cerberus-mvp-bootstrap.azurewebsites.net/profile/supplier?_escaped_fragment_= - however, if on Facebook / on your wall, you share the link https://cerberus-mvp-bootstrap.azurewebsites.net/profile/supplier it doesnt attempt to retrieve the ?_escaped_fragment_ version which will end up being the homepage. Hopefully this explains the issue (FB crawler / sharing links on FB does not seem to use _escaped_fragment_ when retrieving SPA page links — Henry C, May 20 '15 at 10:11
Ok, thanks. if you share the url, which ends with /supplier, why would FB try to fetch with ?_escaped_fragment_ instead? Only if the hash-bang notation (#!) is present, it will try to fetch the escaped version. Do you have an example url with #! in it? — Roemer, May 20 '15 at 10:38
We use HTML5 mode in Angular, which uses push state (so by default, most users *don't* see #! urls - only IE9 / browsers that don't support push state). So no, we don't have example urls with #! in it, because we're using HTML5 mode in Angular. — Henry C, May 20 '15 at 10:51
https://developers.google.com/webmasters/ajax-crawling/docs/specification - see "Pages without hash fragments" — Henry C, May 20 '15 at 10:55
Then I don't get your question. You ask, "why is FB always hitting the root". The url that you are sharing -is- the root url. What other url should FB then? What -exact- url do you expect FB to hit? How to you share specific urls? You provide more entrypoints then just your root, right? — Roemer, May 20 '15 at 10:56
Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/78306/discussion-between-henry-c-and-flaxfield). — Henry C, May 20 '15 at 10:56
Hello @HenryC, I had the same problem and I also had a CDN in front of the website that filtered the user agent so I couldn't even act when the facebook crawler was trying to download the page. I'm only using _escaped_fragment_ to send the cache page snapshot and everything is working fine when I use the hashbang, however when using html 5 push state Facebook wouldn't request the page with the _escaped_fragment_ parameter. How did you solve it? My solution was to do a dynamic index.html and put the correct url in the og:url meta tag so that the Facebook crawler will follow it. — demetrio812, Jan 08 '16 at 08:41
I didn't ever fix it, we still have the workaround where Facebook doesnt follow the same route as other crawlers (like google and bing) — Henry C, Jan 08 '16 at 13:55

Why does Facebook bot / crawler always hit the root page for my AngularJS SPA?

0 Answers0