2

I have an angularjs application, where I want to share pages on Facebook. This is handled with meta tags (https://developers.facebook.com/docs/sharing/best-practices), but I cannot change the meta tags with js because js isn't executed by Facebook's crawlers. Therefore I want to use prerender.io to execute and render my pages before the crawler gets them from server.

The thing is I am not sure if I understand the documentation correct (https://github.com/greengerong/prerender-java).

This is the example web.xml from the README.md on GitHub:

<filter>
      <filter-name>prerender</filter-name>
      <filter-class>com.github.greengerong.PreRenderSEOFilter</filter-class>
      <init-param>
          <param-name>prerenderServiceUrl</param-name>
          <param-value>http://localhost:3000</param-value>
      </init-param>
      <init-param>
          <param-name>crawlerUserAgents</param-name>
          <param-value>me</param-value>
      </init-param>
  </filter>
  <filter-mapping>
      <filter-name>prerender</filter-name>
      <url-pattern>/*</url-pattern>
  </filter-mapping>

After a bunch of attempts to get things right, I found out that if I simply remove this part:

      <init-param>
          <param-name>prerenderServiceUrl</param-name>
          <param-value>http://localhost:3000</param-value>
      </init-param>

I don't have to deal with websockets in GAE (that gave me this error: 'Caused by: java.net.SocketException: Permission denied: ...'), and I can use the default already deployed at http://prerender.herokuapp.com. Question 1) What are the pros/cons using the default service vs. deploying my own?

Now the service seems to be working, and I don't get server errors - great!

As described in the documentation (https://github.com/greengerong/prerender-java) I first used 'me' as user crawler agent. When using 'me' as crawler agent, prerender started to cache my own API calls. E.g when I was GETing a bunch of items from my server, prerender returned some HTML and cached the URI with the JSON i wanted. So now I have some cashed pages at prerender.io, but not exactly the pages I want to cache :).

So I changed crawlerUserAgent to this:

     <init-param>
          <param-name>crawlerUserAgents</param-name>
          <param-value>facebookexternalhit/1.1</param-value>
      </init-param>

(I've also tried facebookexternalhit, FacebookUserExternalHit, ...). Now I don't get any pages cached on prerender.io, and the javascript isn't executed before Facebooks crawler gets the pages. Having a look at the debugger (https://developers.facebook.com/tools/debug/og/object/), it tells me that the crawler only sees the original meta tags, and not the meta tags that I replace with js on different pages (the meta tags are replaced when I open my page and inspect the elements).

Question 2) Am I doing this right? Should I try other crawler user agents? Is facebookexternalhit correct?

stianlp
  • 999
  • 9
  • 19
  • It seems there are some issues with GAE sockets: http://stackoverflow.com/questions/19321625/java-gae-openid4java-fails-while-doing-discovery-on-google-permission-denied – Prerender.io Sep 25 '14 at 05:03

0 Answers0