0

I am using tika-parsers as part of a web application

<groupId>org.apache.tika</groupId>
<artifactId>tika-parsers</artifactId>
<version>1.11</version>

and had problems deploying it on wildyfly (8.2.1 and 10.0.0.RC4). This was resolved by adding a jboss-all.xml containing:

<jboss xmlns="urn:jboss:1.0">
    <weld xmlns="urn:jboss:weld:1.0" require-bean-descriptor="true"/>
</jboss>

But now tika returns empty Strings for e.g. pdf or ms office files. I assume it is falling back to the EmptyParser. Text files seem to work.

This is my simple test code that works correctly when being run as a junit test.

AutoDetectParser parser = new AutoDetectParser();
BodyContentHandler handler = new BodyContentHandler(9000000);
Metadata metadata = new Metadata();
parser.parse(entry.getValue(), handler, metadata);
String s = handler.toString();
Community
  • 1
  • 1
Philipp
  • 4,180
  • 7
  • 27
  • 41
  • Did you try following the [Apache Tika Troubleshooting guide for "No Content"](http://wiki.apache.org/tika/Troubleshooting%20Tika#No_Content_Extracted)? How far did you get through that? – Gagravarr Dec 23 '15 at 13:53
  • It is showing the correct version ("Apache Tika 1.11") and detects the mimetype of my files correctly but still uses the org.apache.tika.parser.EmptyParser for e.g. pdf and doc. – Philipp Dec 23 '15 at 15:13
  • What about the parser checks - did they show you as having all the parsers you'd expect as available? – Gagravarr Dec 23 '15 at 20:01
  • Multiple parsers show including those i expected, but still the EmptyParser is chosen when testing. The only strange thing i found is, that every parsers is listed twice while iterating... – Philipp Jan 11 '16 at 19:00
  • I got it. Seems the way i iterated through my streams did not work correctly. Thanks for your help though! – Philipp Jan 14 '16 at 08:33

0 Answers0