0

I need to extract main news content from a web page.I searched on internet and found an api named Boilerpipe freely available for that purpose http://boilerpipe-web.appspot.com/ But I'm not abled to find any implementations in java that make use of Boilerpipe.Can anyone tell me how can I use Boilerpipe in Java to extract the news content or give me some links to implementations in java which make use of Boilerpipe to extract content from a news web page?

hippietrail
  • 15,848
  • 18
  • 99
  • 158
dark_shadow
  • 3,503
  • 11
  • 56
  • 81
  • Have you considered using a library like Jsoup? http://jsoup.org/ Do you have a specific website you are trying to scrape? – B. Anderson Apr 13 '12 at 19:35

2 Answers2

1

may be my answer is too late. But it's pretty simple.

 URL url = new URL("http://www.nydailynews.com/sports/baseball"); 
 ArticleExtractor ae = new ArticleExtractor();
 String content = ae.getText(url);  // this contains the final text
samsamara
  • 4,630
  • 7
  • 36
  • 66
0

simple huh, suppose you need to extract this URL

just use my BoilerPipe Alternative Web API HERE, my service is based on boilerpipe,i have developed this because of getting overquota error in the original application..you have the option to get back the result in JSON,just consume it in your application..

Best Regards

ashif-ismail
  • 1,037
  • 17
  • 34