I'm trying to extract the content of a large dataset that contains a mix of files (pdf
, doc
, ppt
).
I'm using tika-app-1.12.jar
, when T run my code everything done perfectly then I got this error
Exception in thread "main" org.apache.tika.exception.TikaException:
Unexpected RuntimeException from
org.apache.tika.parser.microsoft.OfficeParser@3ea25501 at
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:258)
at
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:256)
at
org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
at
recruitmentprototyp.RecruitmentPrototyp.tikareadDoc(RecruitmentPrototyp.java:135)
at
recruitmentprototyp.RecruitmentPrototyp.doForAll(RecruitmentPrototyp.java:110)
at
recruitmentprototyp.RecruitmentPrototyp.main(RecruitmentPrototyp.java:897)
Caused by: java.lang.IllegalStateException: Pap style 19 claimed to
have itself as its parent, which isn't allowed at
org.apache.poi.hwpf.model.StyleSheet.createPap(StyleSheet.java:232)
at org.apache.poi.hwpf.model.StyleSheet.<init>(StyleSheet.java:120)
at org.apache.poi.hwpf.HWPFDocument.<init>(HWPFDocument.java:346) at
org.apache.tika.parser.microsoft.WordExtractor.parse(WordExtractor.java:81)
at
org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:201)
at
org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:172)
at
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:256)
... 5 more Java Result: 1
what should I do?!!