4

I am in the process of developing an online music magazine. We have a html5/flash music player, and this forms a major part of the website. But the site also has a lot of articles and stuff. So basically, I want seamless music playback across page loads, but I also want to avoid a complete javascript application because I want all the content to be spider friendly and indexable in Google.

I use html5 history api with the hashbang (#!) fallback for loading various content within the main page on clicks. And the URLs loaded also point to pages with the content.

For example: munimkazia.com/page1.html link in my index page munimkazia.com will load the content from page1.html and insert it. The URL will change to munimkazia.com/#!/page1.html in firefox and IE, and munimkazia.com/page1.html in chrome.. Since the href link is munimkazia.com/page1.html, the spider will follow the link and fetch the content. I have the page set up properly at page1.html, ready for viewing. But now, I have problems.

If I decide to use ajax loads at this page, the URLs appearing on the browser location bar will not be consistent with the hashbang fallback (http://munimkazia.com/page1.html/#!/page2.html) If I decide to redirect all clicks to the main container page at http://munimkazia.com and load page2.html, everything will work fine after this, but this page load will interrupt the music playback before it, if any.

Also, I don't want to rewrite all http://munimkazia.com/page1.html to http://munimkazia.com/#!/page1.html, because I want all the content to be present and not fetched and written by javascript for search engines spiders to read. I am aware that Google has a spec to read the content from #! URLs, but I want the page to load with the article content for the user even if JS is disabled

Any ideas/advice/workarounds?

Edit: Those URLs are just examples to explain my point. There is no javascript code to fetch pages at munimkazia.com

Munim
  • 6,310
  • 2
  • 35
  • 44

1 Answers1

3

Hash-bang #! URL's can be indexed by Google, that's kinda the whole point of them otherwise people would just use hash # on it's own.

I think the idea is that Google see's the #! URL and converts it into a querystring parameter, eg. example.com/#!/products/123/ipod-nano-32gb becomes example.com/?_escaped_fragment_=/products/123/ipod-nano-32gb but users still use the hash-bang URL. You program the server to response to the ?_escaped_fragment_ parameter, but JavaScript user get redirected to the proper #! URL.

Check out Google specification here http://code.google.com/web/ajaxcrawling/docs/getting-started.html

I don't think it's a good idea to use both types of URL, as you'd have two URL's being posted on blogs, Twitter etc. by users for the same page, would also be a nightmare to write the code to handle it reliably. You'd probably have to settle for hash-bangs for now until HTML5 History API is more broadly supported.

Sunday Ironfoot
  • 12,840
  • 15
  • 75
  • 91
  • 1
    I think it's a good idea to use both, because the regular URLs that can be used with the history API are so much better. You can use a canonical link to help Google realise both URLs point to the same content. – Bart van Heukelom Feb 16 '11 at 09:00
  • Fair point, but my issue is that Chrome/Safari users would post /products/123/ipod-nano links while FF & IE users would post /#!/products/123/ipod-nano. But if you can get around the URL canonicalisation issues, and the insane architectural code need to support this, then perhaps it could work. However, sites such as Twitter and Facebook decided not to bother, and I'm wondering why? – Sunday Ironfoot Feb 16 '11 at 09:05
  • 1
    I am aware of Google's ajax crawling spec, and I was kinda thinking of that as the last resort, because no other search engine supports the spec. Also, facebook does use the History API. Try opening up a photo from your newsfeed in chrome and your browser URL will change to photo.php. In firefox, it adds a #!, but it uses history API in chrome. Give it a shot :) – Munim Feb 16 '11 at 09:22
  • I just remembered another **major** problem I thought about when I was thinking about the sole #! approach. If I decide to redirect all munimkazia.com/page1.html to munimkazia.com/#!/page1.html to load the main container and load the page by ajax, everything will fail if JS is disabled. I atleast want the page to load with the article if JS is disabled – Munim Feb 16 '11 at 09:28
  • If you're just going to get Google et al. to crawl the regular URL's then you don't need the hash-bang URLs, just use hash # because Google see's #! and will try to do the _escaped_fragment_ thing which other search engines don't support as you say. I didn't say using both approaches was impossible, just very difficult, but good luck. I would perhaps approach it in a progressive enhanced way, build a regular site with normal URL's that do regular GET requests, add History.pushstate support, gradually add # fragment support etc. – Sunday Ironfoot Feb 16 '11 at 09:44
  • Yep. Actually, we have an online music magazine already, which I am in the process of upgrading. I have started implementing my approach and things look pretty okay so far. As for my initial question, I am thinking of reloading the page inside the AJAX app's container if the user tries to start playing music. That should ensure seamless music. Anyway, thanks for your input – Munim Feb 16 '11 at 11:36