-1

I made a little research about how to request http sites and explore them offline; I found this as one possible answer:

http://www.javaworld.com/jw-05-2000/jw-0518-offload.html

but the thing is, it's not so complete or intuitive. Does anyone have one good literatur source about this topic that I could use?

Thanks in advance

Victor Oliveira
  • 3,293
  • 7
  • 47
  • 77

2 Answers2

2

Use Jsoup: Java HTML Parser

jsoup is a Java library for working with real-world HTML. It provides a very convenient API for extracting and manipulating data, using the best of DOM, CSS, and jquery-like methods.

Reading the content is as easy as this:

Document doc = Jsoup.connect("http://en.wikipedia.org/").get();
Elements newsHeadlines = doc.select("#mp-itn b a");
Juned Ahsan
  • 67,789
  • 12
  • 98
  • 136
  • I got the idea of the jsoup reading the documentation but the thing is, this connect method open the connection and take objects and sub-objects from the URL, and then I need to handle the save part on my HD? – Victor Oliveira Jul 03 '13 at 17:22
  • @VictorOliveira JSoup loads the entire html into a Document object. You can traverse that document and fetch the nodes. – Juned Ahsan Jul 03 '13 at 17:25
  • I'm sorry, but I still don't get the point of how I'm gonna acess this object and his content. Wont I download it as one file? – Victor Oliveira Jul 03 '13 at 17:29
  • @VictorOliveira document object has a lot of utility methods to read different parts of the page. Check this tutorial to get some feel of it: http://www.mkyong.com/java/jsoup-html-parser-hello-world-examples/ – Juned Ahsan Jul 03 '13 at 17:30
  • I'm thinking here how to merge both ways cos I already have the connection and the download of the html to my HD..but the problem is, the links that comes with the html I download are not local acessible, for that I would need something like jsoup - Or am I missunderstanding something? || btw thanks for the link – Victor Oliveira Jul 03 '13 at 17:42
1

Use Jsoup:

Document doc = Jsoup.connect("http://en.wikipedia.org/").get();
Strin html=doc.html();
//save html in a file
surfealokesea
  • 4,971
  • 4
  • 28
  • 38