I would override the mapSafeElement, mapSafeAttribute and isDiscardElement methods to access this element during the parse, since Tika may be rejecting the non-standard/non-"safe" attribute "data-postid" - as shown below.
Then, you would use this class via the ParseContext object, as follows:
InputStream input = <your Uri/file/string input stream>;
ParseContext parseContext = new ParseContext();
parseContext.set(HtmlMapper.class, AllTagMapper.class.newInstance());
HtmlParser parser = new HtmlParser();
parser.parse(input, new ContentHandler(), new Metadata(), parseContext);
// Override HtmlMapper to process all tags and tributes.
class AllTagMapper implements HtmlMapper {
@Override
public String mapSafeElement(String name) {
return name.toLowerCase();
}
@Override
public boolean isDiscardElement(String name) {
return false;
}
@Override
public String mapSafeAttribute(String elementName, String attributeName) {
return attributeName.toLowerCase();
}
}