3

I am currently trying to parse the semantic structure of a PDF file. I believe the metadata added to make PDFs accessible is the correct way to go about it, but I can't find a library that will handle it cleanly.

I've tried PDFLib TET on iOS but I can't get it to open certain test documents and the error it returns is too obscure to be Googleable.

Are there any other libraries that do the same?

ruipacheco
  • 15,025
  • 19
  • 82
  • 138
  • What's the goal of parsing the semantics? Are you rendering to HTML? – ckundo Oct 26 '13 at 13:31
  • more explanation is required. do you wish to add structure to something where is does not exist? what library could interpret some text string as h1 or h2 ... or a collection of information as a table? if you are working with source content and wish to generate tagged pdf then that is different. – Kevin Brown Oct 26 '13 at 17:16
  • As @ckundo said, I want to read a tagged PDF and turn it into HTML. – ruipacheco Oct 26 '13 at 17:31

1 Answers1

0

I'd have a look at the pCOS-Library (also from http://pdflib.com). For use in PHP there would be an alternative you could have a look at http://www.setasign.com/. They might have a tool for that purpose.

heiglandreas
  • 3,803
  • 1
  • 17
  • 23