I have the following test code to detect docx content type:
@Test
public void testContentTypeOfaWordDOCXFileIsReturnedCorrectlyByTheServer() throws IOException, TikaException {
File docxFile = new File(FILE_COMPLETE_PATH);
InputStream inputStream = new FileInputStream(docxFile);
MediaType mediaType=spyServlet.getServerInducedType(inputStream);
assertEquals(DOCX_TYPE, mediaType);
}
while the getServerInducedType is implemented as the following:
protected MediaType getServerInducedType(InputStream inputStream) throws IOException, TikaException {
try (BufferedInputStream buffStream = new BufferedInputStream(inputStream);
TikaInputStream tikaInputStream = TikaInputStream.get(buffStream)
) {
TikaConfig tikaConfig = new TikaConfig();
Detector detector = tikaConfig.getDetector();
Metadata metadata=new Metadata();
MediaType mediaType=detector.detect(tikaInputStream, metadata);
return mediaType;
}
}
Question: When I am running the above test I expect to get DOCX_TYPE which is "application/x-tika-ooxml", but I am getting "application/zip". Why?
ps. I do not have any tika.config or TIKA_CONFIG env variable (see here).
I also added tika parser and tika core to the pom file (see here)
This is the output that I get:
java.lang.AssertionError: Expected :application/x-tika-ooxml Actual :application/zip <Click to see difference>
I test it with jpg file and Tika can detect it fine as image/jpeg
my pom file has the following config:
<dependency>
<groupId>org.apache.tika</groupId>
<artifactId>tika-core</artifactId>
<version>1.9</version>
</dependency>
<dependency>
<groupId>org.apache.tika</groupId>
<artifactId>tika-parsers</artifactId>
<version>1.9</version>
</dependency>