You can use xdmp:document-filter() to read an XLSX and produce XHTML output.
I have used xdmp:document-filter()
to quickly/easily process XLXS files and transform the XHTML output into multiple XML documents and then insert them into the MarkLogic database.
Each row will produce an XHTML <p>
element (don't forget that it is bound to the namespace http://www.w3.org/1999/xhtml
) with a comma separated value text()
node representing each column of data, in addition to some useful <meta>
elements with information about the file.
For instance a row with three columns:
foobarbaz
would produce:
<p>foo,bar,baz</p>
You could select the meaningful rows of data and then tokenize the CSV values to produce your columns for each row of data in the <p>
elements.
You might need to filter our some of the <p>
generated for the sheet tabs:
<p>Sheet1</p>
as well as rows that did not contain any values and simply produce a sequence of commas:
<p>,,,</p>