I am using boilerpipe
library to analyzer news articles. There news articles contain a lot of boilerplate such as copyright information, side pane of related articles, etc. Boilerpipe
removes all that information. Is it possible to return the boilerplate information? I need to analyzer and extract some stuff from copyright statement, etc.
Also, does it contains some sort of confidence for each text block as to whether it is boilerplate or not?
Thanks.