0

I am trying to write a helper script for a colleague that will automatically open up all .doc(x) files in a directory, find any and all chinese characters, set their Font, save and close.

I already have a working version of this script. The file opening/saving/closing part is handled in Python/win32com and works fine. My big point of contention is still the VBA macro.

I know there is a regex (\p{Han}) that should be able to catch all Chinese characters, but this does not seem to work in VBA. Similarly, I have tried using Unicode Ranges and Chr(W). Nothing so far produced any output, let alone correct output. Out of frustration, I made one last ditch attempt and simply inverted the search paramters. This is how it is now:

Sub FindReplace_zh(Rng As Range)
    With Rng.Find
        Do While .Execute(FindText:="[!A-ZÄÖÜa-zäöü0-9><_ ^11^13§$²³%#&/\+-]", MatchWildcards:=True)
            If Rng.Font.Bold = True And Rng.Font.Name Like "Arial*" Then
                Rng.Font.Name = "SimHei"
            ElseIf Rng.Font.Bold = False And Rng.Font.Name Like "Arial*" Then
                Rng.Font.Name = "SimSun"
            End If
            Rng.Collapse 0
        Loop
    End With
End Sub

AT LEAST THIS WORKS, but its far from elegant and still produces some undesired output.

I have yet to understand how I can substitute "[!A-ZÄÖÜa-zäöü0-9><_ ^11^13§$²³%#&/+-]" with a variable, or most anything else. Many characters are not covered by this regex, such as "(", ")" etc., but adding them (even escaped with ) will result in runtime errors in VBA. I found a lot of tutorials and questions dealing with removing or inserting text, but my specific case of finding text and then changing the font, while leaving everything else untouched, seems rather specific.

Fun fact: I had to add ^11 and ^13 to the regex list, as not including them would lead to the Macro inserting new linebreaks in random positions of the .doc

EDIT: New try with comment:

Dim searchPattern As String
searchPattern = "[" & ChrW(&H2E80) & "-" & ChrW(&HFFED) & "]{1,}"
    With Rng.Find
        Do While .Execute(FindText:=searchPattern, MatchWildcards:=True)

Invalid operation on final line! I also would not have concatinated a string like this. I am not sure how VBA parses this, but apprently not the way we hoped.

EDIT2: FIX

Removing "{1,}" from searchPattern did it. Now it works exactly as I expected it to :)

searchPattern = "[" & ChrW(&H2E80) & "-" & ChrW(&HFFED) & "]"
tst
  • 371
  • 1
  • 11
  • In terms of Word Find/Replace with wildcards the ! means not any of the following characters. Is this what you intended? or did you mean the character '!'. – freeflow Oct 21 '19 at 11:37
  • This is precisely my workaround. Instead of having a regex that finds Chinese characters, I wrote a regex that finds anything but the characters in the negative set. So all standard latin characters + some Umlaut characters and a range of special characters. The problem is that this list is a) not exhaustive enough and b) can never be complete due to the regex-restrictions in VBA (you cannot add "\)" or even "\^" without getting an error from Visual Basic. – tst Oct 21 '19 at 11:43
  • You need to read the Microsoft help page on find and replace wildcards. The wildcards are similar to, but not the same as, Regex. Your assertions on ")" etc are untrue because I have used such symbols many time in find/replace in word VBA without any VBA issues. – freeflow Oct 21 '19 at 12:03
  • I have been reading a lot of VBA/MS help pages but nothing so far worked. That said, this is just the workaround. While improving this regex would of course be beneficial, I much rather get the actual solution in a more straight-forward manner if possible, i.e.: a regex or piece of code that finds and manipulates _only_ Chinese characters. – tst Oct 21 '19 at 12:05
  • this was the first Item I find when I googled. http://pinyin.info/news/2016/how-to-find-chinese-characters-in-an-ms-word-document/ – freeflow Oct 21 '19 at 12:07
  • Yes. I tried stuff like this. The problem is that VBA is incapable of rendering anything outside of ASCII. If you copy `[⺀-■]{1,}` into the VBA env you get this: `[?-?]{1,}` VBA will not accept this as valid code. – tst Oct 21 '19 at 12:13
  • 1
    Paste the characters into word and then print the hex value of each character. This will tell you which ChrW characters to use to build your string. "[⺀-■]" give the hex values of 5B,2E80,2D,FFED,5D. Therefore you can build your string as "[" & ChrW(&H2E80) & "-" & ChrW(&HFFED) & "]{1,}" – Freeflow 9 mins ago – freeflow Oct 21 '19 at 14:20
  • Doesn't work either, unfortunately :( See OP Edit. – tst Oct 23 '19 at 08:01
  • HA! I fixed it. The string you concatinated was fine, but the final "{1,}" broke it somehow! Please put this below, so I can mark as fixed :) – tst Oct 23 '19 at 08:28

1 Answers1

1

It is possible to find the value of characters that cannot be represented in the VBIDE by pasting them into an empty Word document and then using VBA to print the AscW values of each character in the text you wish to investigate. You can then use ChrW in VBA to reassemble the text in a VBA friendly way.

From

pinyin.info/news/2016/…

You can use the find string "[⺀-■]{1,}" to find any Chinese character. However as you have noted when you paste this text into the VBA IDE you get [?-?]{1,} because VBA uses UTD-8 as its character set. (I think).

The following code

Public Sub PrintCharacterValues()

Dim myIndex As Long

    With ActiveDocument.Paragraphs(1).Range

        For myIndex = 1 To 8

            Debug.Print .Characters(myIndex), AscW(.Characters(myIndex)), Hex(AscW(.Characters(myIndex)))

        Next

    End With

End Sub

Gives the output of

"              34           22
[              91           5B
?              11904        2E80
-              45           2D
?             -19           FFED
]              93           5D
"              34           22
               160          A0

Thus you can get the critical section of the find string as

"[" & ChrW(&H2£80) & "-" & ChrW(&HFFED) &"]"
freeflow
  • 4,129
  • 3
  • 10
  • 18