1

I'm trying to extract news article from link. I use the following code to extract with its class name. I'm pretty sure that the specific class exists but it fails to get the contents. The same code works for other similar sites.

Document document = Jsoup.connect(newsLink).get();
Elements element = document.getElementsByClass("ins_storybody");
story = element.text();
Stephan
  • 41,764
  • 65
  • 238
  • 329
ashif-ismail
  • 1,037
  • 17
  • 34

3 Answers3

1

I am not sure why your solution does not work, but if you use the css selector functionality it should work:

String story = document.select("div.ins_storybody").text();
luksch
  • 11,497
  • 6
  • 38
  • 53
0

Both of the below have worked for myself

Document doc= Jsoup.connect("http://www.ndtv.com/world-news/apple-paid-ceo-tim-cook-10-3-million-in-2015-1263130").get();
Elements element = doc.getElementsByClass("ins_storybody");
String text= element.text();
System.out.println(text);


Document doc = Jsoup.connect("http://www.ndtv.com/world-news/apple-paid-ceo-tim-cook-10-3-million-in-2015-1263130").get();
String text  = doc.select("div.ins_storybody").text();
System.out.println(text);

Have you checked to make sure that you have provided the correct url? Try print out the 'doc' variable to your console, this should hold the contents of the webpage.

Gareth1305
  • 86
  • 3
0

You can also try this CSS selector:

#ins_storybody

SAMPLE CODE

Document document = Jsoup.connect(newsLink).get();
Element element = document.getElementById("#ins_storybody").first();
if (element==null) {
    throw new RuntimeException("Unable to locate story in: " + newsLink);
}
story = element.text();

The element can be retrieved like below too:

Element element = document.getElementById("ins_storybody");
Stephan
  • 41,764
  • 65
  • 238
  • 329