Goal: Programmatically process a word document with bibliographic citations (ultimately from EndNote) and convert it to another format (LyX).
How do I go through the document so that I can recreate it? Where a citation occurs, I only want to output a reference to the cited work, not the characters that appear in the document. Where the bibliography occurs, I only want to output a command to insert the bibliography, not the textual sources.
Sample Input (as it appears in MS Word when reveal codes is off):
As was said[1] it’s a long way to tip
- Agha, S., Patterns of use of the female condom after one year of mass marketing. AIDS >Educ Prev, 2001. 13(1): p. 55-64.
Desired Output style (pseudo-LaTeX):
As was said\ref{bib526} it’s a long way to tip.
\bibliography
The bib526
would be extracted from the Field.Code
.
The following program illustrates several oddities:
Document.Characters.Count
is much shorter than the actual number of characters in the document range. This appears to reflect the many hidden characters in, e.g.,Field.Code
, but I'm not sure if the characters that appear in the bibliography count.- The
Field
appears to be repeated across some of the printed characters in "[1]". How can I assure I only output it once, and that I do not output any of the printed characters for the reference? (There are 2 fields in some cases because there is both a citation field and a hyperlink field). - Apparently
Document
may have several stories inStoryRanges
, though my little test only has one. Which should I pay attention to?
[Without this the next block isn't formatted right. Don't know why.]
Public Sub inspect()
Dim a As Document
Dim r As Range
Set a = ActiveDocument
Set r = a.Range
Debug.Print "Document ranges from"; r.Start; "to"; r.End; "with";
r.Characters.Count; "Characters"
For ic = 11 To 16
Set r = a.Characters(ic)
Debug.Print "Character"; ic; r.Start; r.End; r.Fields.Count; r.Text
Next ic
End Sub
Output:
Document ranges from 0 to 6137 with 659 Characters
Columns are
A = position in `Document.Characters`
B, C = start, end of corresponding range (often more than 1 character!)
D = # of fields in the character
E = printed representation of the character
A B C D E
Character 11 10 11 0 d
Character 12 11 1559 2 [
Character 13 1559 1607 1 1
Character 14 1607 1609 2 ]
Character 15 1609 1610 0
Character 16 1610 1611 0 i
I might do the actual program in Python, not VBA.