5

I'm reading text from a text file. The first string the text file has to read is "Algood ", and note the space. In Notepad, it appears that there is a space in this string, but it isn't. When I test the 6th (zero-based index) character in Visual Studio's QuickWatch, it appears as:

"�"c

When I use the Asc function to get the ASCII code, it tells me that the ASCII code is 63. 63 is a question mark. But when I test to see if the string contains ASCII 63, it tests false. So it appears that the string contains the character with the ASCII code 63, only it doesn't, it contains some other character which tests as ASCII code 63. This is a problem: I can't remove the character if I don't know what to call it. I could remove the last character, but not every string in the text file contains this character.

enter image description here

The question is: what is this character if not a question mark, and how can I uniquely identify so I can remove it?

Lou
  • 2,200
  • 2
  • 33
  • 66
  • 3
    Try `AscW` to get the Unicode value. – Raymond Chen Sep 14 '14 at 22:32
  • Well, we couldn't tell you what it is since you didn't post the file... open your hex editor to find out for sure. That question mark symbol shows up when you have a character not within the font. – Brad Sep 14 '14 at 22:33
  • @RaymondChen, it says "Cannot convert to integer" when I try to evaluate the character using `AscW`. – Lou Sep 14 '14 at 22:34
  • So long as I know how to remove it, I'm happy. That's solved my problem - if you want to put it as an answer I can accept :). – Lou Sep 14 '14 at 22:36

3 Answers3

7

It is the Unicode replacement character, U+FFFD, aka ChrW(&HFFFD).

Never use Asc() or Chr(), they are legacy VB6 functions that do not handle Unicode. Passing a fancy Unicode codepoint to Asc() always produces 63, the character code for "?"c, aka "I have no idea what you're saying". The exact same idea as"�"c but using an ASCII code instead.

Seeing the Black Diamond of Death back is always bad news, something went wrong when the string was converted from the underlying byte values. Because some byte values did not produce a valid character. Which is what you really should be looking for, you always want to avoid GIGO. Garbage In Garbage Out is an ugly data corruption problem that has no winners, only victims. You.

Hans Passant
  • 922,412
  • 146
  • 1,693
  • 2,536
  • I'm writing single use code, I'm not going to dig deeper. Your answer has solved my problem though, so I'm happy :). – Lou Sep 14 '14 at 22:39
  • Also, you have one extra F in `ChrW(&HFFFFD)`, I think it should be `ChrW(&HFFFD)` (that's how you wrote it first.) – Lou Sep 14 '14 at 22:40
  • I think the data was sourced from a website, asoiaf.westeros.org, directly copied into a spreadsheet, and then into a text file. The problem must be in the website, because I had the same problem in Calc. In any case, single-use code :P. – Lou Sep 14 '14 at 22:46
  • 1
    Sure, there are at least 6 places where this could have gone wrong, 3 of which you don't control. The last two are most suspect btw, text files are not very good at knowing the encoding of text. Unless it has a BOM, chronically missing. Also on the server-side :) – Hans Passant Sep 14 '14 at 22:50
  • 6 places? Could you elaborate, this is interesting� – Lou Sep 14 '14 at 23:05
  • 2
    Server-side: text file => dbase => web server. Client side: Internet -> browser => text file => your app. You can easily add more, converting data from one format into another keeps *lots* of programmers busy. – Hans Passant Sep 14 '14 at 23:11
  • Sorry it's five years late, just realised I didn't accept this! – Lou Oct 23 '19 at 15:02
2

I have wrote the following function in Excel VBA which will remove the "black diamond" for a single cell.

The hardest thing is to not loop each digit in all field to find it. I needed a method to identify the black diamond without check all digits of all fields.

I used a ADODB recordset, if the string is not accepted by the RS, it means it contains an invalid character. Then it looks for a ASC(63) = “?”, then it trims the cell down to without the black diamond.

The reason this work is when it loops through each digit in the string, it will recognize the black diamond as ASC = 63. If is a real question mark, it will be accepted by the RS.

Private Function Correct_Black_Diamond(ByVal First_Address As Variant) As String
    Dim CheckDigit As Integer
    Dim Temp_string As String
    Dim temp_Rs As New ADODB.Recordset
        temp_Rs.Fields.Append "address", adChar, 9999
        temp_Rs.Open

        temp_Rs.AddNew
            On Error GoTo Further_Address_Check
            temp_Rs!Address = First_Address
        temp_Rs.Update

        Correct_Black_Diamond = First_Address
    Exit Function

Further_Address_Check:
        For CheckDigit = 1 To Len(First_Address)
            If Asc(Mid(First_Address, CheckDigit, 1)) = 63 Then
                Temp_string = Trim(Mid(First_Address, 1, CheckDigit - 1)) & Trim(Mid(First_Address, CheckDigit + 1, Len(First_Address)))
            End If
        Next CheckDigit
        First_Address = Temp_string
        Correct_Black_Diamond = First_Address
        Exit Function

End Function
Ming Yuen
  • 21
  • 3
0

Use:

LDM_MSG.Replace(ChrW(8203), "") 

Instead of:

LDM_MSG.Replace(Chr(63), "")

It solves the problem.

RobC
  • 22,977
  • 20
  • 73
  • 80