I extracted data from blogs using article extractor which returns articles in a string format. Since some pages have sub-links that go into news content I want that data to be extracted too. So, how can I access the data that is inside the sub-links? My code is this:
String news =" ";
try
{
URL url;
url = new URL("http://www.firstpost.com/tag/crime-in-india");
InputSource is = HTMLFetcher.fetch(url).toInputSource();
BoilerpipeSAXInput in = new BoilerpipeSAXInput(is);
TextDocument doc = in.getTextDocument();
news = ArticleExtractor.INSTANCE.getText(doc);
}