I need to extract main news content from a web page.I searched on internet and found an api named Boilerpipe freely available for that purpose http://boilerpipe-web.appspot.com/ But I'm not abled to find any implementations in java that make use of Boilerpipe.Can anyone tell me how can I use Boilerpipe in Java to extract the news content or give me some links to implementations in java which make use of Boilerpipe to extract content from a news web page?
Asked
Active
Viewed 1,970 times
0

hippietrail
- 15,848
- 18
- 99
- 158

dark_shadow
- 3,503
- 11
- 56
- 81
-
Have you considered using a library like Jsoup? http://jsoup.org/ Do you have a specific website you are trying to scrape? – B. Anderson Apr 13 '12 at 19:35
2 Answers
1
may be my answer is too late. But it's pretty simple.
URL url = new URL("http://www.nydailynews.com/sports/baseball");
ArticleExtractor ae = new ArticleExtractor();
String content = ae.getText(url); // this contains the final text

samsamara
- 4,630
- 7
- 36
- 66
0
simple huh, suppose you need to extract this URL
just use my BoilerPipe Alternative Web API HERE, my service is based on boilerpipe,i have developed this because of getting overquota error in the original application..you have the option to get back the result in JSON,just consume it in your application..
Best Regards

ashif-ismail
- 1,037
- 17
- 34