Offline HTTP explore with Java

Question

I made a little research about how to request http sites and explore them offline; I found this as one possible answer:

http://www.javaworld.com/jw-05-2000/jw-0518-offload.html

but the thing is, it's not so complete or intuitive. Does anyone have one good literatur source about this topic that I could use?

Thanks in advance

score 2 · Accepted Answer · answered Jul 03 '13 at 17:06

2

Use Jsoup: Java HTML Parser

jsoup is a Java library for working with real-world HTML. It provides a very convenient API for extracting and manipulating data, using the best of DOM, CSS, and jquery-like methods.

Reading the content is as easy as this:

Document doc = Jsoup.connect("http://en.wikipedia.org/").get();
Elements newsHeadlines = doc.select("#mp-itn b a");

answered Jul 03 '13 at 17:06

Juned Ahsan

67,789
12
98
136

I got the idea of the jsoup reading the documentation but the thing is, this connect method open the connection and take objects and sub-objects from the URL, and then I need to handle the save part on my HD? – Victor Oliveira Jul 03 '13 at 17:22
@VictorOliveira JSoup loads the entire html into a Document object. You can traverse that document and fetch the nodes. – Juned Ahsan Jul 03 '13 at 17:25
I'm sorry, but I still don't get the point of how I'm gonna acess this object and his content. Wont I download it as one file? – Victor Oliveira Jul 03 '13 at 17:29
@VictorOliveira document object has a lot of utility methods to read different parts of the page. Check this tutorial to get some feel of it: http://www.mkyong.com/java/jsoup-html-parser-hello-world-examples/ – Juned Ahsan Jul 03 '13 at 17:30
I'm thinking here how to merge both ways cos I already have the connection and the download of the html to my HD..but the problem is, the links that comes with the html I download are not local acessible, for that I would need something like jsoup - Or am I missunderstanding something? || btw thanks for the link – Victor Oliveira Jul 03 '13 at 17:42

score 1 · Answer 2 · answered Jul 03 '13 at 17:06

1

Use Jsoup:

Document doc = Jsoup.connect("http://en.wikipedia.org/").get();
Strin html=doc.html();
//save html in a file

answered Jul 03 '13 at 17:06

surfealokesea

4,971
4
28
38

Offline HTTP explore with Java

2 Answers2