0

I'm trying to use .NET Regex.Replace (example here in VB.NET) to exclude all non-chars and spaces from a string. For instance, in the string s below, I thought the pattern [^A-z ] should remove all non-alphabets aside from spaces. However, that doesn't seem to work. What am I doing wrong?

Sub Try_Regex_Remove_Caret_Symbol()
' ^ (caret) character is not being removed via exclusion
Dim s As String, p As String
s = "I have a caret which I want removed ^$@#!&"
p = "[^A-z ]"
Console.WriteLine("Input : " & s)
Console.WriteLine("Output: " & Regex.Replace(s, p, ""))
' Input : I have a caret which I want removed ^$@#!&
' Output: I have a caret which I want removed ^
' Note that the caret (^) is not removed as expected
End Sub

2 Answers2

2

Here is the ascii table

    0  1  2  3  4  5  6  7  8  9  A  B  C  D  E  F

  2     !  "  #  $  %  &  '  (  )  *  +  ,  -  .  /

  3  0  1  2  3  4  5  6  7  8  9  :  ;  <  =  >  ?

  4  @  A  B  C  D  E  F  G  H  I  J  K  L  M  N  O

  5  P  Q  R  S  T  U  V  W  X  Y  Z  [  \  ]  ^  _

  6  `  a  b  c  d  e  f  g  h  i  j  k  l  m  n  o

  7  p  q  r  s  t  u  v  w  x  y  z  {  |  }  ~  

You'll notice that the caret (^) is in between A and z. To get your desired effect you'll need:

p = [^A-Za-z ]
Jan
  • 42,290
  • 8
  • 54
  • 79
MotKohn
  • 3,485
  • 1
  • 24
  • 41
  • Got it. That makes sense. Thank you. – darth_bloggs May 03 '17 at 11:53
  • I didn't realize the Regex spec relies on ASCII ordering -- functionally, that doesn't seem to make sense. If most human mathematicians specified the set of "everything that is not in A to z or space" (I now see that spec can be ambiguous), they would probably expect any special chars to be in that exclusion set regardless of their ordering in the ASCII table. Appears to be a quirk of the early PCRE implementation since Perl behaves this way too. – darth_bloggs May 03 '17 at 12:06
0

Use the following regex for replacement:

[^A-Za-z ]

The problem is that the caret is in between the range A-Z and a-z, hence you are excluding it from replacement with [^A-z].

Full code:

s = "I have a caret which I want removed ^$@#!&"
p = "[^A-Za-z ]"
Console.WriteLine("Input : " & s)
Console.WriteLine("Output: " & Regex.Replace(s, p, ""))

Demo here:

Rextester

Tim Biegeleisen
  • 502,043
  • 27
  • 286
  • 360