4

I'm trying to extract the content of a large dataset that contains a mix of files (pdf, doc, ppt).

I'm using tika-app-1.12.jar, when T run my code everything done perfectly then I got this error

Exception in thread "main" org.apache.tika.exception.TikaException:
Unexpected RuntimeException from
org.apache.tika.parser.microsoft.OfficeParser@3ea25501  at
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:258)
        at
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:256)
        at
org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
        at
recruitmentprototyp.RecruitmentPrototyp.tikareadDoc(RecruitmentPrototyp.java:135)
        at
recruitmentprototyp.RecruitmentPrototyp.doForAll(RecruitmentPrototyp.java:110)
        at
recruitmentprototyp.RecruitmentPrototyp.main(RecruitmentPrototyp.java:897)
Caused by: java.lang.IllegalStateException: Pap style 19 claimed to
have itself as its parent, which isn't allowed  at
org.apache.poi.hwpf.model.StyleSheet.createPap(StyleSheet.java:232)
        at org.apache.poi.hwpf.model.StyleSheet.<init>(StyleSheet.java:120)
        at org.apache.poi.hwpf.HWPFDocument.<init>(HWPFDocument.java:346)       at
org.apache.tika.parser.microsoft.WordExtractor.parse(WordExtractor.java:81)
        at
org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:201)
        at
org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:172)
        at
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:256)
        ... 5 more Java Result: 1

what should I do?!!

Andy Turner
  • 137,514
  • 11
  • 162
  • 243
Abeer zaroor
  • 320
  • 2
  • 17

0 Answers0