4

I'm having serious problems with string-handling. As my problems are rather hard to describe, I will start with some demo code reproducing them:

Dim s1 As String = "hi"
Dim c(30) As Char
c(0) = "h"
c(1) = "i"
Dim s2 As String = CStr(c)
s2 = s2.Trim()
If not s1 = s2 Then
   MsgBox(s1 + " != " + s2 + Environment.NewLine + _
          "Anything here won't be printed anyway..." + Environment.NewLine + _ 
          "s1.length: " + s1.Length.ToString + Environment.NewLine + _
          "s2.length: " + s2.Length.ToString + Environment.NewLine)
End If                    

The result messagebox looks like this:

screenshot of the messagebox showing only hi != hi but not the rest of the text

The reason that this comparison fails is that s2 has the length 31 (from the original array-size) while s1 has the length 2.

I stumble over this kind of problem quite often when reading string-information out of byte-arrays, for example when handling ID3Tags from MP3s or other encoded (ASCII, UTF8, ...) information with pre-specified length.

Is there any fast and clean way to prevent this problem?

What is the easiest way to "trim" s2 to the string shown by the debugger?

Mukyuu
  • 6,436
  • 8
  • 40
  • 59
Janis
  • 436
  • 4
  • 12

4 Answers4

7

I changed the variable names for clarity:

Dim myChars(30) As Char
myChars(0) = "h"c           ' cannot convert string to char
myChars(1) = "i"c           ' under option strict (narrowing)
Dim myStrA As New String(myChars)
Dim myStrB As String = CStr(myChars)

The short answer is this:

Under the hood, strings are character arrays. The last 2 lines both create a string one using NET code, the other a VB function. The thing is that, although the array has 31 elements, only 2 were initialized:

enter image description here

The rest are null/Nothing, which for a Char means Chr(0) or NUL. Since NUL is used to mark the end of a String, only the characters up to that NUL will print in the Console, MessageBox etc. Text appended to the string will not display either.


Concepts

Since the strings above are created directly from a char array, the length is that of the original array. The Nul is a valid char so they get added to the string:

Console.WriteLine(myStrA.Length)     ' == 31

So, why doesn't Trim remove the nul characters? MSDN (and Intellisense) tells us:

[Trim] Removes all leading and trailing white-space characters from the current String object.

The trailing null/Chr(0) characters are not white-space like Tab, Lf, Cr or Space, but is a control character.

However, String.Trim has an overload which allows you to specify the characters to remove:

myStrA = myStrA.Trim(Convert.ToChar(0))
' using VB namespace constant
myStrA = myStrA.Trim( Microsoft.VisualBasic.ControlChars.NullChar)

You can specify multiple chars:

' nuls and spaces:
myStrA = myStrA.Trim(Convert.ToChar(0), " "c)

Strings can be indexed / iterated as a char array:

    For n As Int32 = 0 To myStrA.Length
        Console.Write("{0} is '{1}'", n, myStrA(n))  ' or myStrA.Chars(n)
    Next

0 is 'h'
1 is 'i'
2 is '

(The output window will not even print the trailing CRLF.) You cannot change the string's char array to change the string data however:

   myStrA(2) = "!"c

This will not compile because they are read-only.

See also:

ASCII table

Ňɏssa Pøngjǣrdenlarp
  • 38,411
  • 12
  • 59
  • 178
2

If you want to create strings from a byte array, i.e. ID3v2.4.0 with ISO-8859 encoding, then this should work:

    Dim s1 As String = "Test"
    Dim b() As Byte = New Byte() {84, 101, 115, 116, 0, 0, 0}
    Dim s2 As String = System.Text.ASCIIEncoding.ASCII.GetString(b).Trim(ControlChars.NullChar)

    If s1 = s2 Then Stop

According to this http://id3.org/id3v2.4.0-structure other encodings may be present and the code would need to be adjusted if one of the others is used.

dbasnett
  • 11,334
  • 2
  • 25
  • 33
  • ID3 is just one of many examples where I have to convert byte-arrays to strings. Still the ControlChars.NullChar is nice to know, thanks :) – Janis Jun 25 '14 at 11:18
  • I was also trying to point out how encoding could be used to get the strings in the first place since you didn't show how you did it in your example. OT - when you upvoted my answer I saw a +10 in my inbox, but my total points did not change. – dbasnett Jun 25 '14 at 11:54
  • OT: Then I guess that the programmers of this programmimg forum made a programming mistake ;) – Janis Jun 25 '14 at 21:04
1

The cause is that CStr(c) is treating the NUL (0) characters as members of the resulting string instead of a string-terminator. The base String.Trim() fails to work because it does not consider NUL characters as white-space.

One way to avoid this problem is to only convert the characters (or bytes) up to the first NUL (or 0); the TakeWhile function is useful in this case.

Const NUL as Char = Microsoft.VisualBasic.ChrW(0)
Dim cleanChars() as Char = _
    c.TakeWhile(Function(v, i) v <> NUL) _
     .ToArray

CStr(cleanChars) ' -> "hi"

If the data really comes from Bytes (and not Chars), it might be prudent to switch to Encoding.GetString so the encoding/process is explicit and well-understood, e.g.

Encoding.UTF8.GetString(cleanBytes) ' -> still "hi"
user2864740
  • 60,010
  • 15
  • 145
  • 220
  • If coming from arbitrary Bytes, you need to pick a character set and encoding that has one byte per character and has all 256 byte as valid values. UTF-8 and Windows-1252 won't do. CP437 will. The property of having characters that .NET encodes in exactly one Char (or String position) is also helpful, though I can't think of any character set with an encoding that meets the previous conditions that don't also meet the latter condition. – Tom Blodget Jun 25 '14 at 02:17
  • @TomBlodget If not mapping from ASCII then the OP is on his/her own (or rather, that's a separate question) - the selection of UTF8 was for an example but, as pointed out, is not always applicable. – user2864740 Jun 25 '14 at 03:03
  • The TakeWhile-Function seems to do just the same as when I loop through the byte-array (or char-array) myself until finding the first 0 (or CChar(0)). I personally prefer the manual code, as I know what exactly it does and as it is easier to understand for whoever else might use my code. Also it is just as short. Anyway thanks for the info :) – Janis Jun 25 '14 at 10:22
  • 1
    @Janis I prefer the LINQ/HoF method as it can make many tasks easier at a conceptual level once it becomes a familiar tool. In any case, the chosen implementation can be hidden behind a method - and, you're welcome :) – user2864740 Jun 25 '14 at 10:24
  • As for the encoding: It is sometimes not specified (ID3Tags) which encoding is used. In all other cases I agree that it is the best way to do a conversion, as the conversion works in both directions in a standardized and well known way. The problem still is, if the byte-field for a string has a fixed length (e.g. song-title in ID3Tags has exactly 30 bytes). Then the old problem with padding-zeros will cause the exact same problems as posted in my original post, even with Encoding.GetString. – Janis Jun 25 '14 at 10:29
  • @user2864740: I use LINQ for some tasks, when I consider it easier / faster than the classic VB-code. But in this case... just nahhh :D That is the reason why I accepted Plutonix classic solution as answer to this thread. But thanks again for all the information, your posts were very helpful anyway :) – Janis Jun 25 '14 at 10:40
0

You can either Dim or ReDim the char array once you know the length of the s1 string.

Dim s1 As String
s1 = "hi"
Dim c(s1.Length) As Char
c(0) = "h"
c(1) = "i"
Dim s2 As String = CStr(c)

And now your comparison will work no matter the length of the original string. You didn't state whether the length of 30 for 'c' is a requirement or not here.

But even if it was, you'd still need to either expand or contract the array to have the same CStr length to do your comparison.

So even after declaring

Dim c(30)

You can later in the code block redimension the array like this

ReDim c(s1.Length) 'Or any int value you like

If increasing you can precede with the preserve keyword, which will expand the array while maintaining its current contents.

ReDim Preserve c(s1.Length)
G2Bam
  • 1
  • 2