I am trying build custom xpath contentHandler for tika that recognizes complex xpath expression, by using code from org/apache/tika/sax/BodyContentHandler.java (because I am using tika for other stuff)
This xpath works
/xhtml:html/xhtml:body/descendant:node()
but this does not
//xhtml:div[@id='someid']/descendant:node()
I want to integrate tika's contentHandler (because it fixes html contents unbalanced tags and invalid character) with xpath evaluator from javax.xml.xpath. What is a proper way of doing that. Is there a way I can get inputsource once tika has evaluated and fixed html content?