Extracting Paragraph from a news article by div class name

Question

I'm trying to extract news article from link. I use the following code to extract with its class name. I'm pretty sure that the specific class exists but it fails to get the contents. The same code works for other similar sites.

Document document = Jsoup.connect(newsLink).get();
Elements element = document.getElementsByClass("ins_storybody");
story = element.text();

Have you checked the class exists? Try reviewing the source of the linked webpage. — David Rushton, Jan 07 '16 at 12:13
I'm trying to get the news post from the above link in my Android app using jsoup lib...the website has the div class with the specified name but it fails to extract... The same java code that I have used work for other similar site — ashif-ismail, Jan 07 '16 at 12:13
Yes the class exists...pls have a look to verify..if I'm going wrong somewhere — ashif-ismail, Jan 07 '16 at 12:14
Not familiar with `jsoup` but the native function `document.getElementsByClass` might need to be `document.getElementsByClassName` — William Isted, Jan 07 '16 at 12:50

score 1 · Answer 1 · answered Jan 07 '16 at 12:57

1

I am not sure why your solution does not work, but if you use the css selector functionality it should work:

String story = document.select("div.ins_storybody").text();

answered Jan 07 '16 at 12:57

luksch

11,497
6
38
53

I'll try and let you know – ashif-ismail Jan 07 '16 at 13:03

score 0 · Answer 2 · answered Jan 07 '16 at 14:15

Both of the below have worked for myself

Document doc= Jsoup.connect("http://www.ndtv.com/world-news/apple-paid-ceo-tim-cook-10-3-million-in-2015-1263130").get();
Elements element = doc.getElementsByClass("ins_storybody");
String text= element.text();
System.out.println(text);


Document doc = Jsoup.connect("http://www.ndtv.com/world-news/apple-paid-ceo-tim-cook-10-3-million-in-2015-1263130").get();
String text  = doc.select("div.ins_storybody").text();
System.out.println(text);

Have you checked to make sure that you have provided the correct url? Try print out the 'doc' variable to your console, this should hold the contents of the webpage.

but it still sucks for me,dont know the actual reason...any how thanks for your help! — ashif-ismail, Jan 08 '16 at 14:12

Stephan · Accepted Answer · 2016-01-08T09:10:56.853

0

You can also try this CSS selector:

#ins_storybody

SAMPLE CODE

Document document = Jsoup.connect(newsLink).get();
Element element = document.getElementById("#ins_storybody").first();
if (element==null) {
    throw new RuntimeException("Unable to locate story in: " + newsLink);
}
story = element.text();

The element can be retrieved like below too:

Element element = document.getElementById("ins_storybody");

edited Jan 08 '16 at 09:10

answered Jan 08 '16 at 08:58

Stephan

41,764
65
238
329

U mean like this ....Elements element = doc.getElementsByClass("#ins_storybody"); – ashif-ismail Jan 08 '16 at 09:00
Hey Stephen, can u answer my this question http://stackoverflow.com/q/34670312/5750634 – ashif-ismail Jan 08 '16 at 09:05
@a_lmukthar Sorry, I don't know this API. – Stephan Jan 08 '16 at 09:11
Ohk,please forward it to someone who may know if u can...its important for me – ashif-ismail Jan 08 '16 at 09:19

Extracting Paragraph from a news article by div class name

3 Answers3

SAMPLE CODE