I am using Apache Tika to extract text from various document formats. I would like to extract images from those files as well (usually PDF or Word).
I am using TikaCLI as a proof of concept with the -z (--extract) option, but it never extracts any attachments. The help screen for TikaCLI and a few web sites out there suggest this should work. I get no output from Tika:
C:\work>Setup.CIPDev-6-3-0-2583\java\bin\java.exe -jar Setup.CIPDev-6-3-0-2583\tomcat\webapps\JavaBridge\WEB-INF\lib\tika-app-1.3.jar -z attachment.pdf
I have tried a variety of arguments, files, and attachment combinations with no success. Has anyone successfully extracted attachments from files with Apache Tika? If so, can you provide some guidance on how you did it?
Any help is greatly appreciated.