I am working on a product where I need to parse a HTML document. I looked for Jericho, TagSoup, Jsoup and Crawl4J. Which parser should I use to parse HTML as I need to run this process in multi thread environment using quartz?
At a time if 10 thread run in memory, then I need an API which consumes less memory. In jericho, I read somewhere that it is text based search API and consumes less memory. Am I right? Or I should go for other and why?