1

Goal: Programmatically process a word document with bibliographic citations (ultimately from EndNote) and convert it to another format (LyX).

How do I go through the document so that I can recreate it? Where a citation occurs, I only want to output a reference to the cited work, not the characters that appear in the document. Where the bibliography occurs, I only want to output a command to insert the bibliography, not the textual sources.

Sample Input (as it appears in MS Word when reveal codes is off):

As was said[1] it’s a long way to tip

  1. Agha, S., Patterns of use of the female condom after one year of mass marketing. AIDS >Educ Prev, 2001. 13(1): p. 55-64.

Desired Output style (pseudo-LaTeX):

As was said\ref{bib526} it’s a long way to tip.

\bibliography

The bib526 would be extracted from the Field.Code.

The following program illustrates several oddities:

  1. Document.Characters.Count is much shorter than the actual number of characters in the document range. This appears to reflect the many hidden characters in, e.g., Field.Code, but I'm not sure if the characters that appear in the bibliography count.
  2. The Field appears to be repeated across some of the printed characters in "[1]". How can I assure I only output it once, and that I do not output any of the printed characters for the reference? (There are 2 fields in some cases because there is both a citation field and a hyperlink field).
  3. Apparently Document may have several stories in StoryRanges, though my little test only has one. Which should I pay attention to?

[Without this the next block isn't formatted right. Don't know why.]

Public Sub inspect()
    Dim a As Document
    Dim r As Range
    Set a = ActiveDocument
    Set r = a.Range
    Debug.Print "Document ranges from"; r.Start; "to"; r.End; "with";
        r.Characters.Count; "Characters"
    For ic = 11 To 16
        Set r = a.Characters(ic)
        Debug.Print "Character"; ic; r.Start; r.End; r.Fields.Count; r.Text
        Next ic
End Sub

Output:

Document ranges from 0 to 6137 with 659 Characters
Columns are
A = position in `Document.Characters`
B, C = start, end of corresponding range (often more than 1 character!)
D = # of fields in the character
E = printed representation of the character
          A   B    C  D E
Character 11  10  11  0 d
Character 12  11  1559  2 [
Character 13  1559  1607  1 1
Character 14  1607  1609  2 ]
Character 15  1609  1610  0  
Character 16  1610  1611  0 i

I might do the actual program in Python, not VBA.

MichaelChirico
  • 33,841
  • 14
  • 113
  • 198
Ross Boylan
  • 21
  • 1
  • 5

0 Answers0