1

I am trying to get the main content of an article from an HTML using boilerpipe code.

Downloaded the latest jars from here.

I am trying to use the following code:

String article = "";
try {
    article = ArticleExtractor.INSTANCE.getText(url);   
    System.out.println("Article ++++ >>" + article);    
} catch (BoilerpipeProcessingException e) {
    // TODO Auto-generated catch block
    e.printStackTrace();
}

But this returns an empty string for every URL. Can anyone help me on this?

Pritam Banerjee
  • 17,953
  • 10
  • 93
  • 108

1 Answers1

2

Have you tried to pass the HTML itself instead of the url? Or maybe there is a problem with the way your url strings are formatted.

Luca Angioloni
  • 2,243
  • 2
  • 19
  • 28