I have a .docx file that contains .pptx file that contains images. I'm trying to figure out how to extract all of the binary files recursively, so I will be able to get the .pptx but most importantly its images. I saw there's "/unpack" endpoint but it does not work recursively, which means it extracts only the child documents of the original file. Any ideas how to get that job done by only using the Tika server?
Asked
Active
Viewed 51 times
0
-
Regrettably, Apache Tika doesn't yet support this. The discussion for this functionality is here: https://issues.apache.org/jira/browse/TIKA-3703 – Tim Allison Jan 23 '23 at 19:21