7

I like to make my GWT-App crawlable by the google bot. I found this article (https://developers.google.com/webmasters/ajax-crawling/). It states there should be a servlet filter, that serves a different view to the google bot. But how can this work? If i use for example the activities and places pattern, than the page changes are on the client-side only and there is no servlet involved -> servlet filter does not work here.

Can someone give me an explanation? Or is there another good tutorial tailored to gwt how to do this?

jan
  • 3,923
  • 9
  • 38
  • 78

2 Answers2

1

If you use Activities&Places your "pages" will have a bookmarkable URL (usually composed of the HTML host page, a #, and some tokens separated by ! or other character).

Thus, you can place links ('s) in your application to make it crawlable. If the link contains the proper structure (the one with # and tokens), it will navigate to the proper Place.

Have a look at https://developers.google.com/web-toolkit/doc/latest/DevGuideMvpActivitiesAndPlaces

xxxxxxxxx
  • 999
  • 5
  • 11
  • I know that part, but the problem is: The google bot can not execute the javascript. So i have to check if the google bot is looking at the site. This link (https://developers.google.com/webmasters/ajax-crawling/) suggests to do this with a servlet filter. But there is no servlet called if client-side page is called. – jan Dec 27 '12 at 14:41
  • As you say Google cannot execute the javascript, so you need to serve it HTML static pages. If you've put too much application logic in your client side (including HTML rendering) then you should use the headless browser approach (explained in https://developers.google.com/webmasters/ajax-crawling/docs/html-snapshot). It basically consists in rendering the HTML page in your server and executing the javascript, and then sending google bot the final generated HTML. – xxxxxxxxx Dec 27 '12 at 14:49
  • The other approach is to use unobtrusive javascript (which, with GWT , is difficult because you must avoid much of its functionality). That basically means that your HTML works even if the GWT javascript is not executed, so Google bot can crawl it. That's where the Activities&Places URL schemes fit. – xxxxxxxxx Dec 27 '12 at 14:52
  • Yes im aware of the headless browser thing etc and how to recognize the google bot with the escaping and so on. But the servlet filter part is not clear to me. This filter filters only if a servlet is called. But there is no servlet called if just gwt code is executed – jan Dec 27 '12 at 15:04
  • The first time you enter your app, the servlet would be invoked with URL=. If you detect that it is a Google bot request, you run your headless browser, which executes your GWT javascript, and creates an HTML (with links to other places, encoded with the #! scheme). Then the bot asks again for those links, your filter intercepts them, and so on... The point is that it's not enough with the filter, your app must be designed in a way that can be crawlable only by following links. For instance: if you need to double-click a list to navigate somewhere, then... – xxxxxxxxx Dec 27 '12 at 15:23
  • ...that will never be crawlable. No matter if you place a filter, use a headless browser, or whatever. You can of course use a headless browser, and then detect that in your javascript and render a fictitious link next to your list simulating the navigation for google bot. I don't know if that's clear enough... – xxxxxxxxx Dec 27 '12 at 15:25
  • And how do i invoke the servlet? By default, there is no servlet invoked if i just call the root of my application... – jan Dec 27 '12 at 15:28
  • Whenever you (or Google Bot) asks for an HTML page a servlet is invoked. That servlet comes prepackaged with your Tomcat (or whatever server you use). Thus, if you declare a Filter, in your web.xml file, your filter will be invoked even if someone is requesting an HTML page. In fact, be prepared, because your filter will be invoked for every GWT RPC call, images, CSSs, etc. – xxxxxxxxx Dec 27 '12 at 15:32
  • I added this crawl filter to my web.xml and the doFilter-Method is only executed if i do a rpc call. If i just open my gwt app, this is not called. Am i doing something wrong here? CrawlServlet de.test.servlets.CrawlServlet CrawlServlet /* – jan Dec 27 '12 at 16:30
  • I mean my page is a html-page. If i call this -> There is no servlet called? – jan Dec 27 '12 at 17:35
  • AFAIK it should work and call your filter. I get my filters called for every resource: even images and CSSs. What server are you using? Tomcat? – xxxxxxxxx Dec 27 '12 at 19:25
  • Look at this (it also says it should work): http://tutorials.jenkov.com/java-servlets/servlet-filters.html – xxxxxxxxx Dec 27 '12 at 19:51
  • When you do figure it out, if it's not from an answer, could you post it as an answer please. – dlamblin Dec 27 '12 at 19:53
  • @dlamblin: Sorry. Which part do you suggest me to post as an answer? I tried to move all this comments to chat but jan has low reputation and system doesn't allow me. I suppose we should finally edit this when we get to a solution. – xxxxxxxxx Dec 27 '12 at 20:12
  • @izaera Sorry izaera my comment should have been directed at jan. It seems you have a clear suggestion that he's not sure how to work with, and I know he doesn't have the rep to edit the answer, so when he gets something working I'd like him to document it here instead of just dropping the issue, as many do. – dlamblin Dec 27 '12 at 20:25
  • I figured it out now: My app is running on app engine and everything is treated as a static file (expect rpc-calls). I had to exlude my html-file from the static files (Added " " to "appengine-web.xml"). Then the servlet-filter was working for the html-file as well. Thank you very much for the help. I can not answer my own question right now. I have to wait a couple of hours. – jan Dec 27 '12 at 21:18
  • May I suggest that you create a new question+answer for the servlet issue? I think it is a different issue and this one would be resolved with this answer, while the other (filter not called because of exclusion in AppEngine) is another question on its own. People arriving at this question will have to traverse all comments to get to your solution. @dlamblin: What would be the "correct" procedure now (I'm also new to StackOverflow)? – xxxxxxxxx Dec 27 '12 at 21:26
  • I hope to get a higher level person to come in here and explain how this should be boiled down for documenting the problem and solution, but based on what you can do with your relative reputations,jan could edit his question to state both problems, the second as the follow on. And @izaera could edit his answer to note the solutions to both. Then that could be accepted and people driving by finding it helpful will start to vote both up. – dlamblin Dec 27 '12 at 23:58
0

So here is the solution to the actual problem:

I wanted to make my GWT (running on Google App Engine) crawlable by the google bot and followed this documentation: "https://developers.google.com/webmasters/ajax-crawling/". I was trying to apply a servlet filter that filters every request to my app and checks for the special fragment in the escaped url that is added by the google bot and present a special view to the bot with a headless browser.

But the servlet did not work for the "MyApp.html"-file. I found out then, that all files are treated as static files and are not affected by the filter. I had to exclude the ".html"-Files from these static files. I did this by adding the line "" to the static files in the "appengine-web.xml".

I hope this will help some people with the same problem to save some time :)

Thanks and best regards jan

jan
  • 3,923
  • 9
  • 38
  • 78