0

im trying to parse a pdf file and get its metadata and text.I still don't get the wanted results. I am sure it is a silly mistake, but i cant see it.The file d.pdf exists and it is located in the project's root folder.The imports are also correct.

public class MultiParse {
      public static void main(final String[] args) throws IOException,
                  SAXException, TikaException {
            Parser parser = new AutoDetectParser();
            File f = new File("d.pdf");        
            System.out.println("------------ Parsing a PDF:");
            extractFromFile(parser, f);
      }

      private static void extractFromFile(final Parser parser,
                  final File f ) throws IOException, SAXException,
                  TikaException {
            BodyContentHandler handler = new BodyContentHandler(10000000);
            Metadata metadata = new Metadata();
            InputStream is = TikaInputStream.get(f);
            parser.parse(is, handler, metadata, new ParseContext());
            for (String name : metadata.names()) {
                  System.out.println(name + ":\t" + metadata.get(name));
            }
      }
}

OUTPUT:No errors, but ..not much either:(

------------ Parsing a PDF:
Content-Type:   application/pdf
yeaaaahhhh..hamf hamf
  • 746
  • 2
  • 13
  • 34
  • Where is your file? Are you sure that either method is actually finding the real file? – Gagravarr May 08 '13 at 09:19
  • seems that it doesnt find the file and i dont know why. I tried ,"d.pdf", "/d.pdf",absolute path,relative path,and last but not least i copied d.pdf to every folder of the project(act of despair)....nothing. – yeaaaahhhh..hamf hamf May 08 '13 at 11:27
  • Is it supposed to be outside or inside of the project classpath? – Gagravarr May 08 '13 at 12:13
  • thnx i did my reading. The program finds the file(i.e i tested with file.exists() and file.isFile()).The output is not encouraging. It only finds its content-type.. – yeaaaahhhh..hamf hamf May 08 '13 at 18:29
  • Did you try running the `tika-cli` tool against it? That'll show you what metadata is available. Also, do you have all of the Tika jars on your classpath? Just the content type makes me think you're missing the PDF related classes – Gagravarr May 09 '13 at 11:21
  • this was helpful http://filotechnologia.blogspot.it/2013/10/armed-with-tika-you-can-be-confident-of.html – yeaaaahhhh..hamf hamf Oct 02 '13 at 08:41

0 Answers0