-2

I need to scrape all data from those sites:

I use JSOUP. And program must scrape all text from site. As you see those sites have different structures. So I should use something common.

halfer
  • 19,824
  • 17
  • 99
  • 186
user3661720
  • 27
  • 12
  • You could analyse the HTML of a site and work out what 'the content' `
    ` is based on how many characters it contains. Other heuristics like where it is rendered is a bit too complicated if you are a beginner. The other approach is to hold a XPath/CSS query per site to describe what should be scraped. (I've -1 as I don't feel this question demonstrates any effort, and we do like to see prior research here).
    – halfer Jun 09 '14 at 20:15

1 Answers1

0

Try this:

import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import java.io.IOException;

public class Sample {

    public static void main(String[] args) throws IOException {

        System.out.println(getPrivacyNotice("http://www.gameloft.com/privacy-notice/","div.terms"));
        System.out.println(getPrivacyNotice("http://outfit7.com/privacy-policy/#","div#main"));

    }
    public static String getPrivacyNotice(String url, String tag)throws IOException {
        Document doc= Jsoup.connect(url).get();
        return doc.select(tag).first().text();
    }
}
Sestertius
  • 1,367
  • 1
  • 14
  • 13