I need to parse XML files that have a number of invalid characters in them. Here is the VB6/VBA code I use to parse a file and replace the invalid characters:
Dim xmldoc As MSXML2.DOMDocument
Dim xmlNode As MSXML2.IXMLDOMNode
Dim xmlNodeList As MSXML2.IXMLDOMNodeList
dim XML as string
dim fno as integer
' get the XML file
fno = FreeFile
Open "input.xml" For Input As #fno
XML = Input(LOF(fno), fno)
Close #fno
TOP_OF_CODE:
Set xmldoc = New MSXML2.DOMDocument60
xmldoc.LoadXML XML
Set xmlNodeList = xmldoc.getElementsByTagName("*")
For Each xmlNode In xmlNodeList
(a bunch of code to parse the XML)
Next xmlNode
If xmldoc.parseError.errorCode <> 0 And xmldoc.parseError.reason = "An invalid character was found in text content." & vbCrLf Then
' invalid character was found
ptr = xmldoc.parseError.filepos
XML = Left(XML, ptr - 1) & "x" & Mid(XML, ptr + 1)
set xmldoc = Nothing
GoTo TOP_OF_CODE
end if
Much of the time the code works exactly as intended: each of the invalid characters is removed iteratively and then the parsing takes place. Sometimes, however, things seem to get "stuck": each time it detects an invalid character at the same position even after I've replaced the invalid character with a valid one. I have tried inserting various characters to replace the invalid one, and have also simply deleted that character position. I still get an invalid character error at the same place. Any clues?