0

We are trying to convert a PDF to XML using the following command

xquery version "1.0-ml";
let $results := xdmp:pdf-convert(
xdmp:document-get("d:\CFR-2010-title48-vol1.pdf"), "CFR-2010-title48-vol1.xml" ),
$manifest := $results[1]
return $results

But it didnt generate the XML output for the PDF. It generated the following output files.

<parts xmlns="xdmp:pdf-convert"> <part>CFR-2010-title48-vol1_xml.xhtml</part> <part>CFR-2010-title48-vol1_xml_parts/01_00.jpg</part> <part>CFR-2010-title48-vol1_xml_parts/01_01.jpg</part> <part>CFR-2010-title48-vol1_xml_parts/conv.css</part> <part>CFR-2010-title48-vol1_xml_parts/toc.txt</part> </parts>

Can you please suggest how to generate the XML output for given PDF file?

Thanks

Venkat

Venkat
  • 21
  • 3

1 Answers1

1

The first document returned is XML.

Were you looking to get the DocBook? For that you need to run the entire upconversion process, and the easiest way to do that is to run the document through the CPF conversion application, which runs through a series of steps and inferences to get to that point.

Or: Are you wondering why the name in the part doesn't match the name from the second parameter to xdmp:pdf-convert? The second parameter is just used to adjust the generated hrefs to images; it is not used for the conversion output itself.

Or: If you want to target XML of some other kind (not XHTML) directly from the format conversion of xdmp:pdf-convert, you can apply a different configuration file. See the documentation on that function for more details.

SamB
  • 9,039
  • 5
  • 49
  • 56
mholstege
  • 4,902
  • 11
  • 7
  • Documentation of xdmp:pdf-convert can be found here: http://docs.marklogic.com/xdmp:pdf-convert – grtjn Jan 07 '14 at 21:42