I'm designing a program for entity extraction from HTML pages. I've got a sketch design, but I'm not happy with it, since it strongly couples my algorithm classes with the HTML parser I chose to use. I'd be happy to hear suggestions of better design.
My design as follows:
public interface HTMLSearcherInterface
{
void readHTML(URI);
List<SearchResultInterface> searchContent(predicate<String>);
}
public interface SearchResultInterface
{
String getResultText();
Node getResultNode();
}
And I have EntityExtractor which holds HTMLSearcherInterface, and use it to search the HTML file for key words around which it would look for other details. This is why I need the getResultNode from the search results.