0

I am trying first to grab all link in a web page by applying the script below: and then i want to use these links again. but since decoding does not always work and this result in an improper link, and i receive 404 error.

Document doc = Jsoup.connect(doi_con).ignoreContentType(true).get();

Elements links = doc.select("a[href]");

for (Element link : links) {
    String url = link.absUrl("href");

    //byte[] decodeds1= DatatypeConverter.parseBase64Binary(url);
    //dec_url = DatatypeConverter.printBase64Binary(decodeds1);

    dec_url = java.net.URLDecoder.decode(url, "UTF-8");
}

Within this code, decoding part seems work for some urls. What i got as samples are below:

http://link.springer.com/signup-login?previousUrl=/article/10.1007%2Fs10899-005-5558-2
http://link.springer.com/article/10.1007/s10899-005-5558-2#kb-nav--main

As seen for the first link decoding did not work while for the later it worked.

What am i missing? I also tried parseBase64Binary and printBase64Binary as seen in the code above but again it did not work.

Thanks in advance!

Madhawa Priyashantha
  • 9,633
  • 7
  • 33
  • 60
mlee_jordan
  • 772
  • 4
  • 18
  • 50
  • What source string are ? – Boris Oct 14 '14 at 12:33
  • @boraldomaster String url = link.absUrl("href"); Source string is the urls retrieved from related web page... – mlee_jordan Oct 14 '14 at 12:43
  • I ask you to provide those urls. If I had those urls I could run this code. – Boris Oct 14 '14 at 12:54
  • you can check this url: http://link.springer.com/article/10.1007%2Fs10899-005-5558-2 – mlee_jordan Oct 14 '14 at 14:10
  • %25 is decoded to %. Everything is correct. What do expect to receive? – Boris Oct 14 '14 at 14:59
  • for the first one article/10.1007%2Fs10899-005-5558-2 for the second article/10.1007/s10899-005-5558-2. Slash could not be converted for the first one. It stays as %2F – mlee_jordan Oct 14 '14 at 15:14
  • Suppose we don't understand each other. Let's forget about second link and talk just about first link. It is **...link.springer.com/signup-login?previousUrl=%2Farticle%2F10.1007%252Fs10899-005-5558-2** right? It is converted to **...link.springer.com/signup-login?previousUrl=/article/10.1007%2Fs10899-005-5558-2**, right? What do you want to receive as conversion result for **...link.springer.com/signup-login?previousUrl=%2Farticle%2F10.1007%252Fs10899-005-5558-2** ? – Boris Oct 15 '14 at 06:12

0 Answers0