1

Is there any function in MarkLogic which will take the input as excel file and convert it into XML file.

I came across one function xdmp:excel-convert() but this function is generating the .xhtml file. and is not working for .xlsx extension excel files.

I am using ML version 7

Dixit Singla
  • 2,540
  • 3
  • 24
  • 39

3 Answers3

3

If you install the Content Processing Framework and the conversion application, you can use it to upconvert .xls format Excel to simplified docbook. If you attach the Office OpenXML Extract pipeline, it will handle unpacking and doing a modest amount of clean-up to .xslx format Excel.

mholstege
  • 4,902
  • 11
  • 7
1

In addition to the good recommendation from mholstege, please note that .xlsx files are just zip files with XML inside. Here's a blog post giving an example how to pull out the XML file from a .docx.

Sam Mefford
  • 2,465
  • 10
  • 15
0

You can use xdmp:document-filter() to read an XLSX and produce XHTML output.

I have used xdmp:document-filter() to quickly/easily process XLXS files and transform the XHTML output into multiple XML documents and then insert them into the MarkLogic database.

Each row will produce an XHTML <p> element (don't forget that it is bound to the namespace http://www.w3.org/1999/xhtml) with a comma separated value text() node representing each column of data, in addition to some useful <meta> elements with information about the file.

For instance a row with three columns: foobarbaz

would produce:

<p>foo,bar,baz</p>

You could select the meaningful rows of data and then tokenize the CSV values to produce your columns for each row of data in the <p> elements.

You might need to filter our some of the <p> generated for the sheet tabs:

<p>Sheet1</p>

as well as rows that did not contain any values and simply produce a sequence of commas:

<p>,,,</p>
Mads Hansen
  • 63,927
  • 12
  • 112
  • 147