0

With spreadsheetML generated by Excel 2007 (and onwards into newer versions), the xl:Row and xl:Cell elements may include attributes for ss:Index which provides a row or column designation that "skips over" intervening entirely blank cells. This makes the markup in the spreadsheetML smaller, but is inconvenient when trying to evaluate a matrix in terms of a regular geometry where blank cells are actually present in the matrix.

I've made an XSLT 1.0 routine in msxml that converts e.g. <xsl:Row><xl:Cell ss:Index="3">dave</xl:Cell></xl:Row> type incoming markup to an "expanded" form like this:

<Row>
  <Col/><Col/><Col>dave</Col>
</Row>

Note the "blank" cells convert to an empty, self closing tag. This is helpful because then further XSLTs can assume that rows and columns present at the correct ordinal position within the Row/Col structure.

However this process is complex to process, and kind of slow. Has anyone tackled the challenge of "unpacking" ss:Index values by other mechanisms? You have to assume that "blank cells" may exist in the incoming data.

My processing platform is ASP classic, Jscript that operates inside of ASP, msxml, and yields the result back to a browser. However any and all perspectives are welcome, unconstrained by this platform description. The unpacking process ideally occurs on the server, as serialized XSLTs operate on the result.

Thank you, Stackoverflow readers!

Dave
  • 378
  • 4
  • 14
  • Could you elaborate on what you mean by "slow"? I assume that you have a `O(ss:Index)` complexity algorithm to expand the missing columns which should not be too slow. Or is the XSLT transformation as such too slow? – Marcus Rickert Apr 16 '14 at 22:43
  • It's not too bad up to 10K rows / columns or so, but beyond that the runtime exceeds the 8-second rule. Since it's generated on demand (a user issuing an http request), it kind of becomes unusable beyond that point. I was just thinking someone else might have encountered a faster approach, maybe to JSON or some kind of ASP object. Some of the optimizations only work well for certain data configurations, but not so well on others. Thanks for reading, Marcus. – Dave Apr 17 '14 at 18:16
  • 1
    Speed can depend on the oddest things; it may be worth experimenting to see if alternative formulations of the code speed things up (I got from 70 minutes to 4 seconds on one stylesheet once, just by rewriting). But your code may be I/O bound, in which case, good luck. – C. M. Sperberg-McQueen Apr 18 '14 at 01:44
  • SpreadSheetML is a pain in the bud. Answer below – stwissel Apr 23 '14 at 07:23

1 Answers1

1

The XML structure isn't what you would like it to be in the spreadsheet. I went through pain to sort it out (with MK chipping in at some point). What you need roughly is like this (there are a few more checks in it):

    <?xml version="1.0" encoding="UTF-8"?>
    <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:e="urn:schemas-microsoft-com:office:spreadsheet" xmlns:ss="urn:schemas-microsoft-com:office:spreadsheet" exclude-result-prefixes="e ss">
        <xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
        <xsl:param name="sheetname">Sheet1</xsl:param>
        <xsl:template match="e:Workbook">
            <xsl:element name="importdata">
                <xsl:attribute name="type"><xsl:value-of select="$sheetname"/></xsl:attribute>
                <xsl:apply-templates select="e:Worksheet[@ss:Name=$sheetname]"/>
            </xsl:element>
        </xsl:template>
        <xsl:template match="e:Worksheet">
            <xsl:apply-templates select="e:Table/e:Row"/>
        </xsl:template>
        <xsl:template match="ss:Row">
            <xsl:if test="not(ss:Cell[1]/@ss:Index)">
                <!-- We do NOT process records where the first cell is empty -->
                <xsl:element name="record">
                    <xsl:attribute name="position"><xsl:value-of select="position()-1"/></xsl:attribute>
                    <xsl:apply-templates select="e:Cell"/>
                </xsl:element>
            </xsl:if>
        </xsl:template>
        <xsl:template match="ss:Cell">
            <xsl:variable name="curCol">
                <xsl:choose>
                    <xsl:when test="@ss:Index">
                        <xsl:value-of select="@ss:Index"/>
                    </xsl:when>
                    <xsl:otherwise>
                        <xsl:call-template name="cell-index">
                            <xsl:with-param name="idx" select="1"/>
                        </xsl:call-template>
                    </xsl:otherwise>
                </xsl:choose>
            </xsl:variable>
            <xsl:element name="field">
                <xsl:attribute name="col"><xsl:value-of select="$curCol"/></xsl:attribute>
                <xsl:attribute name="name"><xsl:value-of select="normalize-space(ancestor::e:Table/e:Row[position()=1]/e:Cell[position()=$curCol])"/></xsl:attribute>
                <xsl:value-of select="ss:Data"/>
            </xsl:element>
        </xsl:template>
        <xsl:template name="cell-index">
            <xsl:param name="idx"/>
            <xsl:if test="$idx &lt;= position()">
                <xsl:choose>
                    <xsl:when test="preceding-sibling::ss:Cell[position()=$idx]/@ss:Index">
                        <xsl:value-of select="preceding-sibling::ss:Cell[position()=$idx]/@ss:Index +$idx"/>
                    </xsl:when>
                    <xsl:when test="$idx = position()">
                        <xsl:value-of select="$idx"/>
                    </xsl:when>
                    <xsl:otherwise>
                        <xsl:call-template name="cell-index">
                            <xsl:with-param name="idx" select="$idx+1"/>
                        </xsl:call-template>
                    </xsl:otherwise>
                </xsl:choose>
            </xsl:if>
        </xsl:template>
        <!-- We don't process the first row since it has the field names in it -->
        <xsl:template match="ss:Row[position()=1]"/>
    </xsl:stylesheet>

Let us know how it goes

stwissel
  • 20,110
  • 6
  • 54
  • 101
  • Very nice! Will give it a try in a while (might take a few days). BTW, did you know blank rows act the same way, so if you want to account for them you have to use similar logic. – Dave Apr 24 '14 at 12:33