0

I'm trying to autogenerate a set of 80+ text tables, each with near identical formatting in Microsoft Word 2013.

The tables have these common characteristics:

  1. A caption in text above each table, with several spots to insert canned text.
  2. A single header row in the table, which has thick top and bottom borders.
  3. N rows of data, with 5 columns each, where N can vary for each table.
  4. Data sourced from some space separated text file, with a single header line.

My idea was to save a sample table as an Word XML format document and then replicate it using BASH scripting in Cygwin. Yes, I know this is sort of roundabout compared to using VBA, but I already know how to do the scripting side of this where as I don't know much about generating tables in VBA and my preliminary reading suggested that would be a more difficult route to go learning-wise, vs. simply making sense of the XML format and parsing it.

Anyhow, the Word XML file I get is rather... verbose. A table with 11 rows of data takes up 91,100+ characters. Digging in I see that much of the issue is lack of common formatting.

About ~44,000 of those characters are devoted to a huge block of closing XML that covers things like fonts, etc. A header block of XML settings takes up around ~5,000. I'm going to just leave these part alone, as it's clearly complicated, and it doesn't really hinder my primary objective.

That leaves about ~42,000 characters worth of XML, which is all devoted to a single table, which in this case is 12 rows by 5 columns with all cell text entries being less than 10 characters long.

An example of a single entry in a row is:

<w:tc><w:tcPr><w:tcW w:w="602" w:type="pct"/><w:vAlign w:val="center"/></w:tcPr><w:p w:rsidR="00B54027" w:rsidRPr="008D2D25" w:rsidRDefault="00B54027" w:rsidP="000C5234"><w:pPr><w:rPr><w:rFonts w:ascii="Times New Roman" w:eastAsia="Times New Roman" w:hAnsi="Times New Roman" w:cs="Times New Roman"/></w:rPr></w:pPr><w:r w:rsidRPr="008D2D25"><w:rPr><w:rFonts w:ascii="Times New Roman" w:eastAsia="Times New Roman" w:hAnsi="Times New Roman" w:cs="Times New Roman"/></w:rPr><w:t>340</w:t></w:r></w:p></w:tc>

Aha, so that's 500+ characters. So 12 rows x 5 cells x ~500 characters per cell = 30,000 characters... add in the extra row formatting, etc. and you have the rest of the mess.

Based on my reading of "Running (a.k.a. -ing) with Word", I have a basic understanding of the syntax and have an idea of how I would like to see my table entries condensed.

My goal is to put for something like this:

<w:tr>
   <w:tc>
      <w:p>
         <w:r>
            <w:t>
               ###
            </w:t>
         </w:r>
       </w:p>
   </w:tc>
   <w:tc>
      <w:p>
         <w:r>
            <w:t>
               ###.##(#)
            </w:t>
         </w:r>
       </w:p>
   </w:tc>
   <w:tc>
      <w:p>
         <w:r>
            <w:t>
               #.###(#)
            </w:t>
         </w:r>
       </w:p>
   </w:tc>
   <w:tc>
      <w:p>
         <w:r>
            <w:t>
               ##.##(#)
            </w:t>
         </w:r>
       </w:p>
   </w:tc>
   <w:tc>
      <w:p>
         <w:r>
            <w:t>
               ##.##(#)
            </w:t>
         </w:r>
       </w:p>
   </w:tc>
</w:tr>

And then to push the formatting to a higher level as most of it is shared. i.e. all my text is Times New Roman, size 12, vertically center aligned, horizontally right aligned.

My questions are:

  1. Is there a way to set a format (i.e. <w:pPr><w:rPr><w:rFonts w:ascii="Times New Roman" w:eastAsia="Times New Roman" w:hAnsi="Times New Roman" w:cs="Times New Roman"/></w:rPr></w:pPr>) such that applies to the entire table in terms of the font, without having to repeat it, each time a run of text begins?
  2. Is there a way to pull out the width (i.e. <w:tcPr><w:tcW w:w="602" w:type="pct"/><w:vAlign w:val="center"/></w:tcPr>) to apply at the column level so I don't have to redefine it for each row's cells, when the table's cells within a give column should all have a single consistent width?
  3. From the documentation it appears rsidR/rsidRPr/rsidRDefault are identifiers related to the revision... which I assume in my autogenerated document wouldn't really have much meaning. Are there downsides to deleting those id settings from <w:p> and <w:r> tags?
  4. (Probably the most important question) Would this be much easier in VBA?

With regards to #4, I'm not asking for a VBA code, just a general sense of whether inserting a canned text caption, inserting a new table, reading in an arbitrary number of lines of space separated data to fill in the table tab cells, sizing the table appropriately, and repeating said process 80+ times would have any serious barriers/obstacles.

Jason R. Mick
  • 5,177
  • 4
  • 40
  • 69
  • 1
    A bit broad for the SO format, you know... In a nutshell: *Word styles* is what's required to consolidate the formatting requirements. But styles "live" in a different XML file in the Word Open XML zip package. And learning about Table styles is somewhat complex. Yes, width can be set at the column level (and is not part of a table style). The rsId stuff can be omitted. IMO VBA would be much easier but then, I could almost right that code without reference to Intellisense... There are a lot of examples on SO and elsewhere, look for an example that writes CSV to range then uses ConvertToTable. – Cindy Meister May 11 '16 at 04:55
  • So basically no way to get rid of all the font tags, but I can chop the id text and move the width spec to a higher level. I'm currently going the VBA route, but am curious about this approach as well, given that I'm finding myself running into areas where the VBA functionality is poorly defined or peculiar (i.e. defining certain properties for table styles like vertical alignment... not in the base style)... I can do a hackish workaround (individually setting the style), but the best practice is seemingly to make a style... and stuck there for now. – Jason R. Mick May 11 '16 at 08:06
  • Well, you do get rid of the font tags in the *document.xml*. They're consolidated in the style definition and only appear once. You might find it instructive to create a simple document with table and table style, save it, then view the Word Open XML in the zip package. As to the VBA, again it's instructive to work in the Word UI in order to understand what's available - the object model mirrors the UI in most aspects. – Cindy Meister May 11 '16 at 16:45
  • Note re your question about vertical cell alignment: This can be set using Word Open XML. – Cindy Meister May 11 '16 at 18:15
  • Yea, my VBA solution is close to done. I definitely learned/refreshed my knowledge which is good, but I have to say I'm somewhat disappointed w/ the overall layout of Word-VBA & its deviations from standard `VB` on various things like `Type`, `initialization lists`, etc. It likely would have been quicker to use a more powerful scripting language (i.e. `BASH`) to selectively edit a template, as my application has a single fixed table style and is basically just dumping ~90 text files into corresponding tables. Word-VBA seems a bit clumsier than Excel-VBA... or that's my experience thus far. – Jason R. Mick May 11 '16 at 22:38
  • To give a bit of background, I'm writing a journal paper and it's going to have around 75 pages worth of tables in its supporting info generated via the `VBA` solution, which in turn eats up the data that my mix of `Python`, `BASH`, `Awk`, and `Expect` scripts (yes there's a bit of all of those in there) generate. I anticipate having to refresh the entire data set at least once more before publication due to minor updates / extensions of my data sets. Regenerating over 120 filtered postprocessed data sets; 50+ graphs and 90+ tables? That's when you're glad you've `automated`.... ;) – Jason R. Mick May 11 '16 at 22:44
  • Well, VBA (derived from classic VB, almost 20 years ago) pre-dates VB.NET by quite a bit so its not surprising that so much of the .NET Framework functionality isn't present. For its time, it wasn't too bad :-) But the changes in programming languages turn around at an ever-increasing pace, making VBA a very healthy, but aged "individual" :-) – Cindy Meister May 12 '16 at 17:07

0 Answers0