1

I am trying to loop through all the sentences in a Word document and parse them into semi-HTML code. During testing, I ran into an interesting situation where any sentence followed by a non-closed sentence would be skipped. For example, if I have the following two sentences:

This is the first sentence in a paragraph with special characters and there should be one more sentence. This is the second sentence that should be there.**

When I loop through each sentence in the paragraph.range.sentences, I only get the first sentence and the ".**" at the end of the paragraph. However, if I add a space between the period and the astriks, then the code works ". **".

How can I make sure the macro reads all the text in a sentence, even if there isn't a space after the period? My example code is below:

Public Sub ParseDoc()
Dim paras As Paragraphs
Dim para As Paragraph
Dim sents As Sentences
Dim sent As Range


    Set paras = ActiveDocument.Paragraphs
    For Each para In paras
        Set sents = para.Range.Sentences
        For Each sent In sents
            MsgBox (sent.Text)
        Next
    Next
End Sub
Ben Rhys-Lewis
  • 3,118
  • 8
  • 34
  • 45
Michael
  • 2,158
  • 1
  • 21
  • 26

2 Answers2

1

It seems to be a problem with the first asterisk. changing that first asterisk to anything else and this code runs as you are hoping. I do not know if this is a special behavior, but if you reference ActiveDocument.Paragraphs(1).Range.Sentences(2).text the full text of the sentence is as you are expecting.

A simple reworking of the loop(s) using while...wend and incremental counters, you can reference the items using their index.

horatio
  • 1,426
  • 8
  • 7
  • Thanks for the answer. I noticed something similar with a "#" character, but most text characters are OK. It's a little curious why this is happening. Let me rewrite my loops and see if I can get it to work. – Michael May 22 '13 at 17:07
  • I did look very quickly for any notes about escape sequences but there're a lot of irrelevant search results. – horatio May 22 '13 at 17:37
0

I couldn't figure out how to "read" all the characters in the sentence in the format of "words.special_character", but I realized that if I replaced all period+special_character instances in the Word document, all my For Each loops work. I used the following code at the very beginning of my sub module and everything worked as expected:

'Adds a <SPACE> between a period and a non-alphanumeric character
With ActiveDocument.Range.Find
    .Text = ".([!0-9A-z ])"
    .Replacement.Text = ". \1"
    .MatchWildcards = True
    .Execute Replace:=wdReplaceAll
End With
Michael
  • 2,158
  • 1
  • 21
  • 26