The crawler escapes "mydomain#!article" into "mydomain?_escaped_fragment_=article", how to retrieve back the original url?

Question

Ok, here is what Google said (https://developers.google.com/webmasters/ajax-crawling/docs/getting-started).

When a crawler sees a url like this www.example.com/ajax.html#!key=value, it will temporarily convert that url into www.example.com/ajax.html?_escaped_fragment_=key=value

However, when doing that it also escapes certain characters in the fragment during the transformation. Ex: www.example.com/ajax.html#!key=value;car=% to www.example.com/ajax.html?_escaped_fragment_=key=value;car=%25

so if we want to convert www.example.com/ajax.html?_escaped_fragment_=key=value;car=%25 back to the original url then we need to unescape all %XX characters in the fragment.

Google said:

Note: The crawler escapes certain characters in the fragment during the transformation. To retrieve the original fragment, make sure to unescape all %XX characters in the fragment. More specifically, %26 should become &, %20 should become a space, %23 should become #, and %25 should become %, and so on.

But google doesn't say How to do that in java.

String originalUrl=changedStr.replace("?_escaped_fragment_=", "!#");
// then what to do next so that all the escaped characters will go back to normal?

Is it ok to do like this

originalUrl=java.net.URLDecoder.decode(originalUrl, "UTF-8");

Which one do we have to use: "UTF-8" or "ASCII" ?

So when the crawler escape the url, does it use URL.encode()?

if it does then which one it uses "UTF-8" or "ASCII"?

score 0 · Answer 1 · edited May 23 '17 at 10:32

0

You may want to look at this SO working example. Of particular interest to you would then be the function rewriteQueryString at the end.

The nuts & bolts is you are on the right track, the key is to call URLDecoder.decode; you may also be interested in the wrapper code around it.

edited May 23 '17 at 10:32

Community

1
1

answered May 16 '14 at 15:53

Patrick

1,561
2
11
22

The crawler escapes "mydomain#!article" into "mydomain?_escaped_fragment_=article", how to retrieve back the original url?

1 Answers1