How to scrape all data from website with Jsoup?

Question

I need to scrape all data from those sites:

I use JSOUP. And program must scrape all text from site. As you see those sites have different structures. So I should use something common.

You could analyse the HTML of a site and work out what 'the content' `
` is based on how many characters it contains. Other heuristics like where it is rendered is a bit too complicated if you are a beginner. The other approach is to hold a XPath/CSS query per site to describe what should be scraped. (I've -1 as I don't feel this question demonstrates any effort, and we do like to see prior research here). — halfer, Jun 09 '14 at 20:15

score 0 · Accepted Answer · answered Jun 10 '14 at 11:36

Try this:

import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import java.io.IOException;

public class Sample {

    public static void main(String[] args) throws IOException {

        System.out.println(getPrivacyNotice("http://www.gameloft.com/privacy-notice/","div.terms"));
        System.out.println(getPrivacyNotice("http://outfit7.com/privacy-policy/#","div#main"));

    }
    public static String getPrivacyNotice(String url, String tag)throws IOException {
        Document doc= Jsoup.connect(url).get();
        return doc.select(tag).first().text();
    }
}

How to scrape all data from website with Jsoup?

1 Answers1