1

We are trying to strip certain special characters from a string using the (simplified) command below, which is the most common solution we've seen after searching. But, the results are inconsistent when using certain special characters. Can anyone explain why? And, better, can anyone offer a solution that works?

SQL Server 2014

In the first case below, the '@' is removed, but in all the other cases where it is included (2+5) it is not removed. Same for the 3rd case: the spaces are removed, but not the '&'; and in the 5th case, the spaces are removed, but not the '@'. Other combinations have similar issues.

Thanks. declare @str varchar(50) = '1st Ave @ 1st St FL-3 Rm 323& New York NY'

declare @Pindex1  varchar(10) = '%[@]%'
declare @Pindex2  varchar(10) = '%[@& ]%'
declare @Pindex3  varchar(10) = '%[& ]%'
declare @Pindex4  varchar(10) = '%[ ]%'
declare @Pindex5  varchar(10) = '%[@ ]%'

Select @str as String, @Pindex1 as Pattern ,Replace(@str, Substring(@str, PatIndex(@Pindex1,@str), 1), '') as PIndex1_result
Select @str as String, @Pindex2 as Pattern ,Replace(@str, Substring(@str, PatIndex(@Pindex2,@str), 1), '') as PIndex2_result
Select @str as String, @Pindex3 as Pattern ,Replace(@str, Substring(@str, PatIndex(@pindex3,@str), 1), '') as PIndex3_result
Select @str as String, @Pindex4 as Pattern ,Replace(@str, Substring(@str, PatIndex(@Pindex4,@str), 1), '') as PIndex4_result
Select @str as String, @Pindex5 as Pattern,Replace(@str, Substring(@str, PatIndex(@pindex5,@str), 1), '') as PIndex5_result
LBerg
  • 33
  • 2
  • 8
  • Not sure what you expected. Most of your patterns can match several possible characters, `patindex` each time finds the first match, you then `replace` that one found character from the string. Obviously the other possible matches (other characters the pattern could match) down the string remain unchanged. Your second pattern is invalid (tries to define a range from `@` to `space`, should be `'%[- @]%'`), the last pattern is also invalid (should be `'%[ @]%'`), but fixing the patterns will not solve the problem of only materializing the pattern once. – GSerg May 15 '17 at 17:17
  • You need to loop until `PatIndex` finds no more matches. – GSerg May 15 '17 at 17:18
  • From what I've seen, looping is unnecessary - it matches any instance of a character inside the [ ], repeatedly. (I'm not trying to use a range function. That was a mistake in my example. See my edited question.) That is what happens with a variety of characters in our testing, but not with the '@' character. – LBerg May 15 '17 at 21:26

1 Answers1

1

I think you might have a misunderstanding of SQL Servers patterns. Consider the second pattern:

declare @Pindex2  varchar(10) = '%[@- ]%'

This can match no pattern. Why? The ASCII value of '@' is 64 and for a space it is 32. Nothing occurs between these values. It is similar to '%[b-a]%', which would also match nothing.

I think the issue is your understanding of SQL Server string patterns.

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • I see the issue with '-' character now. I wasn't trying for ranges (which is what the '-' does). I was trying for each of the characters inside the [ ] to be replaced. I've edited my examples a little to show this problem better. – LBerg May 15 '17 at 18:05