1

I am working with Apache Tika, 1.7, and Apache POI for extracting text from .doc and docx documents in a Maven-built project. For some reason I am getting the

java.lang.NoSuchMethodError: org.apache.poi.util.IOUtils.calculateChecksum

error. As said in the Apache POI FAQ, this is caused by a version problem. So the obvious solution would be to upgrade POI or something. The problem with this is that I am using the version of POI which is bundled with tika, in the tika-parsers package. This is because I am using the Tika type detector, which is the only part of Tika I am using (except for POI). The problem is that, if I use only the tika-core packages and declare the POI dependencies standalone in the maven pom.xml, the Tika detector stops detecting container types, like .docx files, because the tika-parsers package is necessary for the detector, as stated here. So, how can I solve this? I want to do accurate type detection with tika, but I also want to use Apache POI apart from Tika.

Thanks

Community
  • 1
  • 1
user4052054
  • 395
  • 1
  • 6
  • 22

1 Answers1

2

I don't know what your POM looks like, but in most cases this type of problem can be dealt with by excluding the offending transitive dependencies.

It would look something like this:

<dependency>
    <groupId>org.apache.tika</groupId>
    <artifactId>tika-parsers</artifactId>
    <version>1.7</version>
    <exclusions>
        <exclusion>
            <groupId>org.apache.poi</groupId>
            <artifactId>poi</artifactId>
        </exclusion>
    </exclusions> 
</dependency>
<dependency>
    <groupId>org.apache.poi</groupId>
    <artifactId>poi</artifactId>
    <version>3.11</version>
</dependency>

However, looking at the POM for Tika 1.7, it already depends on POI 3.11, which is currently the most recent version, and does include the needed method. So, in all likelihood, you have another dependency somewhere that is pulling in an older version of POI.

You can use the Maven dependency plugin to find the offending library, and use the trick described above to resolve the conflict.

Robby Cornelissen
  • 91,784
  • 22
  • 134
  • 156
  • 2
    Yes, you're right. I tried by turning off dependency after dependency and discovered I have a legacy dependency which pulls in an older version of POI. Thanks. – user4052054 Apr 20 '15 at 00:14