I'm trying to autogenerate a set of 80+ text tables, each with near identical formatting in Microsoft Word 2013
.
The tables have these common characteristics:
- A caption in text above each table, with several spots to insert canned text.
- A single header row in the table, which has thick top and bottom borders.
- N rows of data, with 5 columns each, where N can vary for each table.
- Data sourced from some space separated text file, with a single header line.
My idea was to save a sample table as an Word XML
format document and then replicate it using BASH
scripting in Cygwin
. Yes, I know this is sort of roundabout compared to using VBA
, but I already know how to do the scripting side of this where as I don't know much about generating tables in VBA
and my preliminary reading suggested that would be a more difficult route to go learning-wise, vs. simply making sense of the XML
format and parsing it.
Anyhow, the Word XML
file I get is rather... verbose. A table with 11 rows
of data takes up 91,100+
characters. Digging in I see that much of the issue is lack of common formatting.
About ~44,000
of those characters are devoted to a huge block of closing XML that covers things like fonts, etc. A header block of XML settings takes up around ~5,000
. I'm going to just leave these part alone, as it's clearly complicated, and it doesn't really hinder my primary objective.
That leaves about ~42,000
characters worth of XML
, which is all devoted to a single table, which in this case is 12 rows
by 5 columns
with all cell text entries being less than 10 characters
long.
An example of a single entry in a row is:
<w:tc><w:tcPr><w:tcW w:w="602" w:type="pct"/><w:vAlign w:val="center"/></w:tcPr><w:p w:rsidR="00B54027" w:rsidRPr="008D2D25" w:rsidRDefault="00B54027" w:rsidP="000C5234"><w:pPr><w:rPr><w:rFonts w:ascii="Times New Roman" w:eastAsia="Times New Roman" w:hAnsi="Times New Roman" w:cs="Times New Roman"/></w:rPr></w:pPr><w:r w:rsidRPr="008D2D25"><w:rPr><w:rFonts w:ascii="Times New Roman" w:eastAsia="Times New Roman" w:hAnsi="Times New Roman" w:cs="Times New Roman"/></w:rPr><w:t>340</w:t></w:r></w:p></w:tc>
Aha, so that's 500+ characters
. So 12 rows x 5 cells x ~500 characters per cell = 30,000 characters
... add in the extra row formatting, etc. and you have the rest of the mess.
Based on my reading of "Running (a.k.a. -ing) with Word", I have a basic understanding of the syntax and have an idea of how I would like to see my table entries condensed.
My goal is to put for something like this:
<w:tr>
<w:tc>
<w:p>
<w:r>
<w:t>
###
</w:t>
</w:r>
</w:p>
</w:tc>
<w:tc>
<w:p>
<w:r>
<w:t>
###.##(#)
</w:t>
</w:r>
</w:p>
</w:tc>
<w:tc>
<w:p>
<w:r>
<w:t>
#.###(#)
</w:t>
</w:r>
</w:p>
</w:tc>
<w:tc>
<w:p>
<w:r>
<w:t>
##.##(#)
</w:t>
</w:r>
</w:p>
</w:tc>
<w:tc>
<w:p>
<w:r>
<w:t>
##.##(#)
</w:t>
</w:r>
</w:p>
</w:tc>
</w:tr>
And then to push the formatting to a higher level as most of it is shared. i.e. all my text is Times New Roman
, size 12
, vertically center
aligned, horizontally right
aligned.
My questions are:
- Is there a way to set a format (i.e.
<w:pPr><w:rPr><w:rFonts w:ascii="Times New Roman" w:eastAsia="Times New Roman" w:hAnsi="Times New Roman" w:cs="Times New Roman"/></w:rPr></w:pPr>
) such that applies to the entire table in terms of the font, without having to repeat it, each time a run of text begins? - Is there a way to pull out the width (i.e.
<w:tcPr><w:tcW w:w="602" w:type="pct"/><w:vAlign w:val="center"/></w:tcPr>
) to apply at the column level so I don't have to redefine it for each row's cells, when the table's cells within a give column should all have a single consistent width? - From the documentation it appears
rsidR
/rsidRPr
/rsidRDefault
are identifiers related to the revision... which I assume in my autogenerated document wouldn't really have much meaning. Are there downsides to deleting those id settings from<w:p>
and<w:r>
tags? - (Probably the most important question) Would this be much easier in VBA?
With regards to #4, I'm not asking for a VBA code, just a general sense of whether inserting a canned text caption, inserting a new table, reading in an arbitrary number of lines of space separated data to fill in the table tab cells, sizing the table appropriately, and repeating said process 80+ times would have any serious barriers/obstacles.