0

I have an issue here. Currently, I made research on how to convert the encoded of .xml file from ANSI to UTF-8 and luckily and I found the solution. But there is one problem. My .xml file contain many Spanish language and of course, there is many inverted question mark symbol. In order for eclipse able to perfectly shown all the character in the .xml file, I need to change the encode of the .xml file from ANSI to UTF-8. I manage to change the encoded. But its weird. Even after change the encoded, the contain itself show the unknown character. Below is my code:

Dim objFso, objF As Object
Set objFso = CreateObject("Scripting.FileSystemObject")
xmlFile = NewFolderName & "\" & Application.Cells(5, j + 1).Value
            Set objF = objFso.CreateTextFile(xmlFile, True, False)
            objF.Write "<resources>"
            objF.WriteBlankLines (1)
            i = 11
            Var = Application.Cells(8, j + 1).Value
            Do Until Application.Cells(i, 2).Value = 0
                    objF.Write "     <string name=" & Chr(34) & Application.Cells(i, 2).Value & Var & Chr(34) & ">" & Application.Cells(i, j + 1).Value & "</string>"
                    objF.WriteBlankLines (1)
                i = i + 1
            Loop
            objF.WriteBlankLines (1)
            objF.Write ("</resources>")
            objF.Close
            Set stream = CreateObject("ADODB.Stream")
            stream.Open
            stream.Type = 2
            stream.Charset = "utf-8"
            stream.LoadFromFile xmlFile
            stream.SaveToFile xmlFile, 2
            stream.Close
            Set stream = Nothing

The output from above code is as shown below:

<string name="BroadcastFailed">No se recibi� emisi�n [E202]</string>
<string name="NoInputSelect">No hay selecci�n de entrada disponible</string>
<string name="ThreeDModeQ">�Ver en Modo 3D?</string>

above .xml output is encoded in UTF-8 but unknown characters still appear. What I want is like this:

<string name="BroadcastFailed">No se recibió emisión [E202]</string>
<string name="NoInputSelect">No hay selección de entrada disponible</string>
<string name="ThreeDModeQ">¿Ver en Modo 3D?</string>

anyone who knows what is the error in my code, please drop down your answer. I really appreciate your answers :):)

Hugues M.
  • 19,846
  • 6
  • 37
  • 65
squall leonhart
  • 301
  • 1
  • 5
  • 16

2 Answers2

0

The problem is that you are saving the initial file as ASCII (you are setting the Unicode parameter of CreateTextFile() to False). Per the documentation:

object.CreateTextFile(filename[, overwrite[, unicode]])

The CreateTextFile method has these parts:

Part Description

object Required. Always the name of a FileSystemObject or Folder object.

filename Required. String expression that identifies the file to create.

overwrite Optional. Boolean value that indicates if an existing file can be overwritten. The value is True if the file can be overwritten; False if it can't be overwritten. If omitted, existing files are not overwritten.

unicode Optional. Boolean value that indicates whether the file is created as a Unicode or ASCII file. The value is True if the file is created as a Unicode file; False if it's created as an ASCII file. If omitted, an ASCII file is assumed.

You are then loading the ASCII file as UTF-8. That is fine for ASCII characters (since ASCII is a subset of UTF-8), but you are losing non-ASCII characters, like ó and ¿. That is why you end up with (Unicode codepoint U+FFFD REPLACEMENT CHARACTER) characters in your final file.

You need to save the initial file as Unicode, and then load it into the ADODB.Stream as Unicode, then you won't lose any characters, and can then save the text in whatever charset you want afterwards:

Set objF = objFso.CreateTextFile(xmlFile, True, True) ' Unicode parameter is True
' ...
Set stream = CreateObject("ADODB.Stream")
stream.Type = 2
stream.Charset = "utf-16"
stream.Open
stream.LoadFromFile xmlFile ' load as Unicode
stream.Charset = "utf-8"
stream.SaveToFile xmlFile, 2 ' save as UTF-8
stream.Close
Remy Lebeau
  • 555,201
  • 31
  • 458
  • 770
  • Thanks for your answer. I've tried your solution just now but it still didn't solve the problem. The unknown character still appear and now the output structure becomes more weird. Below is the output: `< s t r i n g n a m e = " T h r e e D M o d e Q " > � V e r e n M o d o 3 D ? < / s t r i n g >` Seems that the structure is way too weird isn't it? – squall leonhart Jan 07 '15 at 04:45
  • I would expect it to look like that from `CreateTextFile()` outputting UTF-16, if the input string from the `Cells()` was ill-formed to begin with. I would not expect it to look like that from `SaveToFile()` outputting UTF-8. Did you verify the initial file has the correct text before loading it into the `ADODB.Stream`? Did you validate the `ADODB.Stream` contains the correct text before calling `SaveToFile()`? – Remy Lebeau Jan 07 '15 at 06:43
  • Yup. The .xml file before load into ADODB.Stream is in unicode encoded with all character are visible/shown normally. Just now I found a new way to solve my problem. Instead of 'convert to utf-8' the encode, I use 'save as to utf-8' approach. This is easier and my code becomes shorter since there is no need to use `FileSystemObject`. Just put my code from `i=11` until `Loop` in between `stream.Open` and `stream.Close`. But anyway, thanks again for your reply @Remy – squall leonhart Jan 07 '15 at 08:35
0

After make some research, finally I find the solution. I need to load my unicode file using LoadFromFile and make the stream to read the content using stream.ReadText and then close it first. And then I need to open back the stream, write back the content into utf-8using stream.WriteText and save it using SaveToFile and then close it permanently. Below is the code. Actually I get the reference from Use "ADODB.Stream" to convert ANSI to UTF-8, miss 1-2 character in the first row.

Set stream = CreateObject("ADODB.Stream")
            stream.Type = 2
            stream.Charset = "unicode"
            stream.Open
            stream.LoadFromFile xmlFile
            strText = stream.ReadText
            stream.Close

            stream.Type = 2
            stream.Charset = "utf-8"
            stream.Open
            stream.WriteText strText
            stream.SaveToFile xmlFile, 2
            stream.Close
            Set stream = Nothing
Community
  • 1
  • 1
squall leonhart
  • 301
  • 1
  • 5
  • 16