With the following MS Word document which only contains two bullet points of separate lists each encapsulated in one-cell tables.
How do I use the Word document's underlying document.xml
, numbering.xml
, and styles.xml
, to capture the type of bullet point (i.e., circle or square)? Reading the http://officeopenxml.com docs and other SO posts, I attempted the following to no avail:
With document.xml, retrieve
$num_id = w:numPr/w:numId/@w:val
and$lvl_id = w:numPr/w:ilvl/@w:val
values.With numbering.xml, using above
$num_id
value, retrieve$abs_id = w:num[@w:numId = $num_id]/w:abstractNumId/@w:val
to return the corresponding value:w:abstractNum[@w:abstractNumId = $abs_id]/w:lvl[@w:ilvl = $lvl_id]/w:lvlText/@w:val
However, this result is not correct as both return as square bullet.
With styles.xml, review the
ListParagraph
w:style
for any additional matching criteria.However, no unique identifiers or values appear useful. What am I missing?
See relevant section of the XML documents. Please advise if other sections or documents are relevant.
document.xml
<w:p w14:paraId="16A4A39D"
w14:textId="10E79F44"
w:rsidR="00DB3D99"
w:rsidRPr="00D6457F"
w:rsidRDefault="00DB3D99"
w:rsidP="007205D3">
<w:pPr>
<w:pStyle w:val="ListParagraph"/>
<w:keepNext/>
<w:numPr>
<w:ilvl w:val="0"/>
<w:numId w:val="5"/>
</w:numPr>
<w:spacing w:before="80" w:after="80"/>
<w:contextualSpacing w:val="0"/>
<w:rPr>
<w:rFonts w:ascii="Franklin Gothic Book" w:hAnsi="Franklin Gothic Book"/>
<w:bCs/>
<w:sz w:val="20"/>
<w:szCs w:val="20"/>
</w:rPr>
</w:pPr>
<w:r w:rsidRPr="00DB3D99">
<w:rPr>
<w:rFonts w:ascii="Franklin Gothic Book" w:hAnsi="Franklin Gothic Book"/>
<w:bCs/>
<w:sz w:val="20"/>
<w:szCs w:val="20"/>
</w:rPr>
<w:t>Mainstreaming environmental considerations into social and economic decisions at all levels is of vital importance</w:t>
</w:r>
</w:p>
...
<w:p w14:paraId="79FEF50C"
w14:textId="65464CBE"
w:rsidR="009C1A5F"
w:rsidRPr="009C1A5F"
w:rsidRDefault="009C1A5F"
w:rsidP="009C1A5F">
<w:pPr>
<w:pStyle w:val="ListParagraph"/>
<w:keepNext/>
<w:numPr>
<w:ilvl w:val="0"/>
<w:numId w:val="9"/>
</w:numPr>
<w:spacing w:before="80" w:after="80"/>
<w:rPr>
<w:rFonts w:ascii="Franklin Gothic Book" w:hAnsi="Franklin Gothic Book"/>
<w:sz w:val="20"/>
<w:szCs w:val="20"/>
</w:rPr>
</w:pPr>
<w:r w:rsidRPr="009C1A5F">
<w:rPr>
<w:rFonts w:ascii="Franklin Gothic Book" w:hAnsi="Franklin Gothic Book"/>
<w:bCs/>
<w:sz w:val="20"/>
<w:szCs w:val="20"/>
</w:rPr>
<w:t>Solutions need to seek an integrated approach that simultaneously address the conservation of the planet’s genetic diversity, species and ecosystems</w:t>
</w:r>
</w:p>
numbering.xml
<w:abstractNum w:abstractNumId="0" w15:restartNumberingAfterBreak="0">
<w:nsid w:val="037970D6"/>
<w:multiLevelType w:val="hybridMultilevel"/>
<w:tmpl w:val="98A2E35C"/>
<w:lvl w:ilvl="0" w:tplc="E7067EF0">
<w:start w:val="1"/>
<w:numFmt w:val="bullet"/>
<w:lvlText w:val=""/>
<w:lvlJc w:val="left"/>
<w:pPr>
<w:ind w:left="360" w:hanging="360"/>
</w:pPr>
<w:rPr>
<w:rFonts w:ascii="Wingdings 2" w:hAnsi="Wingdings 2" w:hint="default"/>
</w:rPr>
</w:lvl>
...
</w:abstractNum>
...
<w:abstractNum w:abstractNumId="8" w15:restartNumberingAfterBreak="0">
<w:nsid w:val="6DA523B5"/>
<w:multiLevelType w:val="hybridMultilevel"/>
<w:tmpl w:val="D0A2943E"/>
<w:lvl w:ilvl="0" w:tplc="CBCE2CF0">
<w:start w:val="1"/>
<w:numFmt w:val="bullet"/>
<w:lvlText w:val=""/>
<w:lvlJc w:val="left"/>
<w:pPr>
<w:ind w:left="360" w:hanging="360"/>
</w:pPr>
<w:rPr>
<w:rFonts w:ascii="Wingdings 2" w:hAnsi="Wingdings 2" w:hint="default"/>
</w:rPr>
</w:lvl>
...
</w:abstractNum>
...
<w:num w:numId="5" w16cid:durableId="963343858">
<w:abstractNumId w:val="0"/>
</w:num>
...
<w:num w:numId="9" w16cid:durableId="324748400">
<w:abstractNumId w:val="8"/>
</w:num>
styles.xml
<w:style w:type="paragraph" w:styleId="ListParagraph">
<w:name w:val="List Paragraph"/>
<w:basedOn w:val="Normal"/>
<w:link w:val="ListParagraphChar"/>
<w:uiPriority w:val="34"/>
<w:qFormat/>
<w:rsid w:val="007205D3"/>
<w:pPr>
<w:ind w:left="720"/>
<w:contextualSpacing/>
</w:pPr>
</w:style>
To show my actual implementation of XPath, I am actually attempting XSLT that transforms document.xml
(making document reference to numbering.xml
) using PowerShell to identify all text and symbol of bullet points in output.
style.xsl
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main">
<xsl:output encoding="UTF-8" omit-xml-declaration="yes" indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:template match="/">
<data>
<xsl:apply-templates select="descendant::w:tbl"/>
</data>
</xsl:template>
<xsl:template match="w:tbl">
<xsl:apply-templates select="descendant::w:p[descendant::w:t != '']"/>
</xsl:template>
<xsl:template match="w:p">
<xsl:variable name="num_id" select="w:pPr/w:numPr/w:numId/@w:val"/>
<xsl:variable name="lvl_id" select="w:pPr/w:numPr/w:ilvl/@w:val"/>
<xsl:variable name="abs_id" select="document('numbering.xml')/w:numbering/
w:num[@w:numId = $num_id]/w:abstractNumId/@w:val" />
<xsl:variable name="num_val" select="document('numbering.xml')/w:numbering/
w:abstractNum[@w:abstractNumId = $abs_id]/
w:lvl[@w:ilvl = $lvl_id]/w:lvlText/@w:val"/>
<xsl:variable name="square_bullet"><![CDATA[]]></xsl:variable>
<xsl:variable name="circle_bullet"><![CDATA[]]></xsl:variable>
<row>
<text>
<xsl:value-of select="."/>
</text>
<symbol>
<xsl:value-of select="$num_val"/>
</symbol>
<type>
<xsl:choose>
<xsl:when test="$num_val = $square_bullet">
<xsl:text>Checkbox</xsl:text>
</xsl:when>
<xsl:when test="$num_val = $circle_bullet">
<xsl:text>Radio</xsl:text>
</xsl:when>
<xsl:otherwise>Text</xsl:otherwise>
</xsl:choose>
</type>
</row>
</xsl:template>
<xsl:template match="text()">
<xsl:value-of select='normalize-space()'/>
</xsl:template>
</xsl:stylesheet>
transform.ps1
$xslSettings = New-Object System.Xml.Xsl.XsltSettings($true, $false);
$xmlResolver = New-Object System.Xml.XmlUrlResolver;
$xslt = New-Object System.Xml.Xsl.XslCompiledTransform;
$xslt.Load("style.xsl", $xslSettings, $xmlResolver);
$xslt.Transform("document.xml", "output.xml");
output.xml
<data xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main">
<row>
<text>Mainstreaming environmental considerations into social and economic decisions at all levels is of vital importance</text>
<symbol></symbol>
<type>Text</type>
</row>
<row>
<text>Solutions need to seek an integrated approach that simultaneously address the conservation of the planet’s genetic diversity, species and ecosystems</text>
<symbol></symbol>
<type>Text</type>
</row>
</data>