NBoilerPipe is a Mono port of the BoilerPipe Java library. I've managed to get this working in .NET 4 without too much trouble (a few library references needed fixing/etc). However, searching through the code, I cannot find any 'hooks' for HTML output. For example, the GetText() method only has one parameter for the input, and I cannot see any additional methods. How can I get HTML output from NBoilerPipe?
Here is the sample NBoilerPipe code that is working in .NET4:
String url = "http:// <etc> "; String page = String.Empty; WebRequest request = WebRequest.Create (url); HttpWebResponse response = (HttpWebResponse)request.GetResponse (); Stream stream = response.GetResponseStream (); using (StreamReader streamReader = new StreamReader (stream, Encoding.UTF8)) { page = streamReader.ReadToEnd (); } String text = ArticleExtractor.INSTANCE.GetText (page); Console.WriteLine ("Text: \n" + text);