I am working with Apache Tika, 1.7, and Apache POI for extracting text from .doc and docx documents in a Maven-built project. For some reason I am getting the
java.lang.NoSuchMethodError: org.apache.poi.util.IOUtils.calculateChecksum
error. As said in the Apache POI FAQ, this is caused by a version problem. So the obvious solution would be to upgrade POI or something. The problem with this is that I am using the version of POI which is bundled with tika, in the tika-parsers package. This is because I am using the Tika type detector, which is the only part of Tika I am using (except for POI). The problem is that, if I use only the tika-core packages and declare the POI dependencies standalone in the maven pom.xml, the Tika detector stops detecting container types, like .docx files, because the tika-parsers package is necessary for the detector, as stated here. So, how can I solve this? I want to do accurate type detection with tika, but I also want to use Apache POI apart from Tika.
Thanks