0

I'm currently writing a program in Java to extract metadata from multiple document type. At the moment I'm trying to extract metadata from .vsd files using Apache Tika. I previously tried using Apache POI directly, but the fact is it's very hard to find any documentation on this unusued part of the library, so I decided to go with Tika.

Ok, so here is the code sample I'm crashing on ( crash at line : 7) :

        ParseContext context = new ParseContext();
        Metadata metadata = new Metadata();
        WriteOutContentHandler handler = new WriteOutContentHandler(10 * 1024 * 1024);
try {
            FileInputStream fis = new FileInputStream(fileName);
            OfficeParser officeParser = new OfficeParser();
            officeParser.parse(fis, handler, metadata, context);
            String[] metadataNames = metadata.names();

            // Display all metadata
            for (String name : metadataNames) {
                System.out.println(name + ": " + metadata.get(name));
            }
        } catch (FileNotFoundException E) {
            System.out.println("No such files : " + fileName);
        }

And here is the stacktrace :

Exception in thread "main" java.lang.RuntimeException: TODO at org.apache.poi.hdgf.pointers.PointerFactory.createPointer(PointerFactory.java:45) at org.apache.poi.hdgf.HDGFDiagram.(HDGFDiagram.java:99) at org.apache.poi.hdgf.extractor.VisioTextExtractor.(VisioTextExtractor.java:55) at org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:200) at org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:161) at VsdFile.displayMetadata(VsdFile.java:43) at main.main(main.java:26) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:601) at com.intellij.rt.execution.application.AppMain.main(AppMain.java:120)

I'm pretty rusty in Java, so I hope my question is not too obvious to answer to.

Thank you.

Regards,

Bdloul

Bdloul
  • 882
  • 10
  • 27
  • It looks like you have an older (v5) vision file which isn't currently supported by Apache POI (hence the TODO). Might you be willing to do a little bit of coding, and submit a patch to add the missing functionality? – Gagravarr Jan 18 '13 at 09:53
  • I'd love to if only I knew where to start. – Bdloul Jan 21 '13 at 23:21
  • You'd want to grab the file format specification documentation from the Microsoft website, then cross reference that with some hex dumps from a few different files.... – Gagravarr Jan 22 '13 at 18:23

1 Answers1

1

So the problem was a bad vsd file.

Bdloul
  • 882
  • 10
  • 27