0

Java and AngularJS on Google App Engine.

As to why, although I've been reassured that most crawlers can parse javascript sites, it's not fully parsing my angularjs site and therefore not indexing it properly. I've created a static version of the site and want to redirect to it conditionally based on user-agent. It works for every url except for the root of my site, or localhost:8080 with or without trailing slash.

I think it's because the config for tuckey UrlRewriteFilter in my web.xml is /*, so it doesn't get triggered without the trailing slash? I've tried changing that, though; I've tried everything I could think of, changing servlet version to 3.0, using "welcome-file", putting empty string for url-pattern, etc.

Thank you for your help.

Urlrewrite.xml:

    <?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE urlrewrite PUBLIC "-//tuckey.org//DTD UrlRewrite 4.0//EN"
        "http://www.tuckey.org/res/dtds/urlrewrite4.0.dtd">

<urlrewrite use-query-string="true">

    <rule>
        <condition name="user-agent">
            facebookexternalhit/[0-9]|facebook|Googlebot|Googlebot-Mobile|
            Mediapartners-Google|AdsBot(.*)|AdSense(.*)|(.*)AdsBot|(.*)AdSense|
            Googlebot-Image|Googlebot-Video|Googlebot(.*)|
            FacebookExternalHit/[0-9]|Mediapartners-Google|AdsBot-Google
            |facebookexternalhit/1.0|FacebookExternalHit/1.1|
            FacebookExternalHit/1.0|facebookexternalhit/1.1|Facebot|Twitter|Twitterbot|Pinterest
        </condition>
        <from>^/(.*)$</from>
        <to>/staticview.jsp</to>
    </rule> 
</urlrewrite>

web.xml:

<web-app version="2.5" xmlns="http://java.sun.com/xml/ns/javaee"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://java.sun.com/xml/ns/javaee http://java.sun.com/xml/ns/javaee/web-app_2_5.xsd">

  <filter>
      <filter-name>UrlRewriteFilter</filter-name>
      <filter-class>org.tuckey.web.filters.urlrewrite.UrlRewriteFilter</filter-class>
  </filter>
  <filter-mapping>
      <filter-name>UrlRewriteFilter</filter-name>
      <url-pattern>/*</url-pattern>
  </filter-mapping>

  <filter-mapping>
    <filter-name>appstats</filter-name>
    <url-pattern>/*</url-pattern>
  </filter-mapping>
  <filter>
    <filter-name>appstats</filter-name>
    <filter-class>com.google.appengine.tools.appstats.AppstatsFilter</filter-class>
    <init-param>
      <param-name>calculateRpcCosts</param-name>
      <param-value>true</param-value>
    </init-param>
  </filter>
  <servlet>
    <servlet-name>appstats</servlet-name>
    <servlet-class>com.google.appengine.tools.appstats.AppstatsServlet</servlet-class>
  </servlet>
  <servlet-mapping>
    <servlet-name>appstats</servlet-name>
    <url-pattern>/appstats/*</url-pattern>
  </servlet-mapping>
  <security-constraint>
    <web-resource-collection>
      <web-resource-name>appstats</web-resource-name>
      <url-pattern>/appstats/*</url-pattern>
    </web-resource-collection>
    <auth-constraint>
      <role-name>admin</role-name>
    </auth-constraint>
  </security-constraint>

  <servlet>
    <servlet-name>rss</servlet-name>
    <servlet-class>com.byron.common.controller.RSSServlet</servlet-class>
  </servlet>
  <servlet-mapping>
    <servlet-name>rss</servlet-name>
    <url-pattern>/rss</url-pattern>
  </servlet-mapping>
  <servlet>
    <servlet-name>rssfull</servlet-name>
    <servlet-class>com.byron.common.controller.FullRSSServlet</servlet-class>
  </servlet>
  <servlet-mapping>
    <servlet-name>rssfull</servlet-name>
    <url-pattern>/rssfull</url-pattern>
  </servlet-mapping>
  <servlet>
    <servlet-name>sitemap</servlet-name>
    <servlet-class>com.byron.common.controller.SitemapServlet</servlet-class>
  </servlet>
  <servlet-mapping>
    <servlet-name>sitemap</servlet-name>
    <url-pattern>/sitemap</url-pattern>
  </servlet-mapping>
  <servlet>
    <servlet-name>Jersey REST Service</servlet-name>
    <servlet-class>com.sun.jersey.spi.container.servlet.ServletContainer</servlet-class>
    <init-param>
      <param-name>com.sun.jersey.config.feature.DisableWADL</param-name>
      <param-value>true</param-value>
    </init-param>
    <!--
    Please try to declare your resource classes statically in your Application implementation as
    follows in order to minimize the startup time of your application.
    -->
    <init-param>
      <param-name>javax.ws.rs.Application</param-name>
      <param-value>com.byron.common.controller.Resources</param-value>
    </init-param>
    <load-on-startup>1</load-on-startup>
  </servlet>
  <servlet-mapping>
    <servlet-name>Jersey REST Service</servlet-name>
    <url-pattern>/rest/*</url-pattern>
  </servlet-mapping>
</web-app>
Kafe
  • 1
  • 1
  • As an update, no solution, but progress. Again, my concern is that it seems like when I hit the root of my site, it's not actually going through Tuckey's URLRewriteFilter. I've had some mixed results by adding a "welcome-file" in the web.xml. ` posts ` With user-agent "google", it redirects the root through "posts" and to my static version. Only way I've achieved root -> static. However, "posts" isn't a real endpoint, so without the agent I just get "403 (forbidden)" from the root, and it seems circuitous anyway. – Kafe Jan 15 '18 at 15:52

1 Answers1

0

try making an explicit rule mapping for root, like so:

<rule>
    <from>^\/?.*$</from>
    <to >[your mapping goes here]</to>
</rule>

(this rule assumes you're using regexps, not wildcards)

i have it in my app, and it catches localhost:8080 calls