2

am using Boilerpipe in my application, and when am trying to extract the content using ArticleExtractor am getting plane text only, all the html formating has been removed, so am trying with HtmlHighlighter. but the process method of HtmlHighlighter fails for certain urls. is there any option to use html string to pass to this method? can anybody explain?

user1685989
  • 33
  • 1
  • 5
  • I found some online Java samples that mentioned HtmlHighlighter, but what is it, and where did you find a .NET port? – winwaed Dec 11 '12 at 17:53
  • sorry for the delay in reply, its years back ! still .....There is NBoilerpipe, available in GitHub – user1685989 Oct 13 '14 at 07:21

1 Answers1

0

You can use IKVM to convert the Boilerpipe jar into a new DLL to use in your .NET aplications. I am using this approach and works fine when sending html thrown the different boilerpipe methods.

If the page content that you are trying to access is loaded by javascript, a simple http request cant handle such information. First you need to get the result html after the javascript changes, and then give it to boilerpipe.