0

I am getting the below multiple errors (see below - one per file) when uploading any office 2007 docs (e.g. pptx, docx, xslx) into Sling. I am using Sling 6 stable standalone.

Is anyone else experiencing this? Are there any known issues with the tika bundle?

Thanks

23.01.2013 14:32:27.248 *WARN* [jackrabbit-pool-1] org.apache.jackrabbit.core.query.lucene.LazyTextExtractorField Failed to extract text from a binary property org.apache.tika.exception.TikaException: Unexpected RuntimeException from org.apache.tika.parser.microsoft.ooxml.OOXMLParser@5217e8de
                at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:122)
                at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:101)
                at org.apache.jackrabbit.core.query.lucene.JackrabbitParser.parse(JackrabbitParser.java:189)
                at org.apache.jackrabbit.core.query.lucene.LazyTextExtractorField$ParsingTask.run(LazyTextExtractorField.java:174)
                at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
                at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
                at java.util.concurrent.FutureTask.run(FutureTask.java:138)
                at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:98)
                at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:206)
                at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
                at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
                at java.lang.Thread.run(Thread.java:662)
Caused by: org.apache.poi.POIXMLException: java.lang.reflect.InvocationTargetException
                at org.apache.poi.xwpf.usermodel.XWPFFactory.createDocumentPart(XWPFFactory.java:60)
                at org.apache.poi.POIXMLDocumentPart.read(POIXMLDocumentPart.java:256)
                at org.apache.poi.POIXMLDocument.load(POIXMLDocument.java:196)
                at org.apache.poi.xwpf.usermodel.XWPFDocument.<init>(XWPFDocument.java:94)
                at org.apache.poi.xwpf.extractor.XWPFWordExtractor.<init>(XWPFWordExtractor.java:45)
                at org.apache.poi.extractor.ExtractorFactory.createExtractor(ExtractorFactory.java:111)
                at org.apache.poi.extractor.ExtractorFactory.createExtractor(ExtractorFactory.java:86)
                at org.apache.tika.parser.microsoft.ooxml.OOXMLParser.parse(OOXMLParser.java:47)
                at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:120)
                ... 11 more
Caused by: java.lang.reflect.InvocationTargetException
                at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
                at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
                at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
                at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
                at org.apache.poi.xwpf.usermodel.XWPFFactory.createDocumentPart(XWPFFactory.java:58)
                ... 19 more
Caused by: java.lang.NoClassDefFoundError: org/openxmlformats/schemas/wordprocessingml/x2006/main/SettingsDocument$Factory
                at org.apache.poi.xwpf.usermodel.XWPFSettings.readFrom(XWPFSettings.java:129)
                at org.apache.poi.xwpf.usermodel.XWPFSettings.<init>(XWPFSettings.java:43)
                ... 24 more
Caused by: java.lang.ClassNotFoundException: org.openxmlformats.schemas.wordprocessingml.x2006.main.SettingsDocument$Factory not found by org.apache.tika.bundle [63]
                at org.apache.felix.framework.ModuleImpl.findClassOrResourceByDelegation(ModuleImpl.java:787)
                at org.apache.felix.framework.ModuleImpl.access$400(ModuleImpl.java:71)
                at org.apache.felix.framework.ModuleImpl$ModuleClassLoader.loadClass(ModuleImpl.java:1768)
                at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
                ... 26 more
Simulant
  • 19,190
  • 8
  • 63
  • 98
NabilS
  • 1,421
  • 1
  • 19
  • 31
  • Looks like the same issue as https://jira.sakaiproject.org/browse/KERN-1245 – Bertrand Delacretaz Jan 25 '13 at 08:46
  • You're missing one of the [Apache POI Dependencies](http://poi.apache.org/overview.html#components) - specifically the poi-ooxml-schemas jar. I don't know how you're getting POI and Tika, but investigate that to see why that required jar isn't there – Gagravarr Jan 25 '13 at 09:15

1 Answers1

0

This was due to missing/incorrect dependencies in the tika 0.6 bundle.

I had to recompile tika 0.6 with the below changes for it to work. I then replaced the tika bundle in the sling standalone jar file. Please let me know if there is a better way to do this as I am a java beginner. Thanks

Changes made to tika-0.6.tika-parsers.pom.xml:

Added:

<dependency>
      <groupId>org.apache.poi</groupId>
      <artifactId>ooxml-schemas</artifactId>
      <version>1.1</version>
    </dependency>
    <dependency>
      <groupId>org.apache.poi</groupId>
      <artifactId>poi-ooxml-schemas</artifactId>
      <version>${poi.version}</version>
    </dependency>

Removed:

<dependency>
      <groupId>org.apache.geronimo.specs</groupId>
      <artifactId>geronimo-stax-api_1.0_spec</artifactId>
      <version>1.0.1</version>
    </dependency>
NabilS
  • 1,421
  • 1
  • 19
  • 31
  • Tika 0.6 is very old, is there a reason why you've not upgraded to the most recent version? – Gagravarr Jan 27 '13 at 12:36
  • Yes only because Sling has not updated. Sling 6, which is the latest stable of Sling is using an old version of Jackrabbit (I think 2.1) which in turn uses Tika 0.6. – NabilS Jan 27 '13 at 12:52