0

I am needing some help fine tuning a T-SQL function to properly DECODE a string which contains URL. Just the query string parameter values are URLEcoded (and not the whole URL). The original function works well when decoding single byte characters however it doesn't handle multiple byte characters. In order to address the DECODING of multiple byte characters such as Spanish Accented characters; my plan was to find the values using PATINDEX and replace those using a lookup table (this is because we are dealing with a small finite number of specials characters which fall in this category).

Problem: The Pattern specified below is not returning any matches therefore I am pretty much stuck at this point.

Example: Pattern '%[%][0-9a-f][0-9a-f]%' works for single byte encoded chars. Similarly Pattern '%[%][0-9a-f][0-9a-f][%][0-9a-f][0-9a-f][%][0-9a-f][0-9a-f][%][0-9a-f][0-9a-f]%' should work for double byte chars such as (%C3%A9 -> ) but it does not.

Here is my code:

 DECLARE @Position INT,
    @Base CHAR(16),
    @High TINYINT,
    @Low TINYINT,
    @Pattern VARCHAR(256),
    @URL VARCHAR(8000)

SET @Url = '%26Text1%3DFrom%20Ren%C3%A9%27s'

SELECT  @Base = '0123456789abcdef',
    @Pattern = '%[%][0-9a-f][0-9a-f][%][0-9a-f][0-9a-f]%',
    --@URL = REPLACE(@URL, '+', ' '),
    @Position = PATINDEX(@Pattern, @URL)

PRINT 'Position: ' + + CAST(@Position AS Varchar(256))

WHILE @Position > 0
    BEGIN
    SELECT  
        @High = CHARINDEX(SUBSTRING(@URL, @Position + 1, 1), @Base COLLATE Latin1_General_CI_AS),
        @Low = CHARINDEX(SUBSTRING(@URL, @Position + 2, 1), @Base COLLATE Latin1_General_CI_AS),
        @URL = STUFF(@URL, @Position, 6, '123456'),
        @Position = PATINDEX(@Pattern, @URL)

    PRINT 'High: ' + CAST(@High AS Varchar(256))

    PRINT @URL
END 
Robert
  • 724
  • 1
  • 11
  • 25
  • Are you sure you mean 4 sets from double byte? Cleary %C3%A9 is just two sets. – paparazzo Aug 21 '13 at 21:17
  • Pattern should match 2 sets of double byte chars. Here is my understanding: [%] matches the literal (i.e. '%') preceding 'C3'. [0-9a-f] matches a single AlphaNumeric charecter (i.e. 'C'). The repeated [0-9a-f] matches the second single AlphaNumeric character (i.e. 3) so [%][0-9a-f][0-9a-f] should match '%C3' and similarlyl the second set of pattern **should** match '%A9'. What am I missing? – Robert Aug 21 '13 at 21:39
  • OK but is %C3%A9 two sets of double bytes or one? – paparazzo Aug 21 '13 at 21:40
  • **%C3%A9** is a set of double bytes each encoded separately. In other words **%C3** alone equates to **Ã** and **%A9** alone equates to **©** but combined they should spit out **é** – Robert Aug 21 '13 at 21:46
  • "Similarly Pattern '%[%][0-9a-f][0-9a-f][%][0-9a-f][0-9a-f][%][0-9a-f][0-9a-f][%][0-9a-f][0-9a-f]%' should work for double byte chars such as (%C3%A9 -> ) but it does not." That pattern is two double bytes and %C3%A9 is one. – paparazzo Aug 21 '13 at 22:02
  • @Blam, I am not understanding how the pattern is flawed ... can you give me an example of how the pattern is not fashioned correctly? See my explanation / understanding of the pattern, above. – Robert Aug 21 '13 at 22:12
  • Let's count. The pattern has 4 (hard) % and %C3%A9 has 2. 4 != 2 – paparazzo Aug 22 '13 at 00:16
  • @Blam, the leading and trailing '%' are the wild-cards. – Robert Aug 22 '13 at 15:04
  • Count the %. The pattern has 4 [%] in addition to the staring and ending. The search string has only two % - %C3%A9. – paparazzo Aug 22 '13 at 17:43
  • @Blam, see my answer below. It has the correct pattern ... what you are referring to (pattern listed within the comment above) was a typo. Thanks for all your help!! – Robert Aug 22 '13 at 19:06

2 Answers2

-1

@Pattern CHAR(21) is truncating

see

set nocount on
 DECLARE @Position INT,
    @Base CHAR(16),
    @High TINYINT,
    @Low TINYINT,
    @Pattern VARCHAR(200),
    @URL VARCHAR(8000)

SET @Url = '%26Text1%3DFrom%20Ren%C3%A9%27s'

SELECT  @Base = '0123456789abcdef',
    @Pattern = '%[%][0-9a-f][0-9a-f][%][0-9a-f][0-9a-f]%',
    --@URL = REPLACE(@URL, '+', ' '),
    @Position = PATINDEX(@Pattern, @URL)

select @URL
select @Pattern    
select @position
paparazzo
  • 44,497
  • 23
  • 105
  • 176
-1

I had a syntax error within the pattern. After reading the documentation closely I realized that I needed to escape % sign using an additional percent sign. Here is the working solution (the sub query to replace the value is not working but the pattern is):

    DECLARE @Position INT,
    @Base CHAR(16),
    @High TINYINT,
    @Low TINYINT,
    @Pattern nVARCHAR(256),
    @ToReplace nVARCHAR(256),
    @ReplaceWith nVARCHAR(256),
    @URL nVARCHAR(4000)

SET @Url = '%26Text1%3DFrom%20Ren%C3%A9%27s%C3'

SELECT  @Base = '0123456789abcdef',
    @Pattern = '%[%%][c-f][0-9]%%[0-9a-f]%',
    --@URL = REPLACE(@URL, '+', ' '),
    @Position = PATINDEX(@Pattern, @URL)

  PRINT 'Position: ' + + CAST(@Position AS Varchar(256))

WHILE @Position > 0
  BEGIN
  SELECT  
        @High = CHARINDEX(SUBSTRING(@URL, @Position + 1, 1), @Base COLLATE Latin1_General_CI_AS),
        @Low = CHARINDEX(SUBSTRING(@URL, @Position + 2, 1), @Base COLLATE Latin1_General_CI_AS),
        @ToReplace = SUBSTRING(@URL, @Position, 6),
        @ReplaceWith = (SELECT COALESCE([Text], 'Something') FROM dbo.ExtendedAsciiLookup WHERE UTF = @ToReplace),
        @URL = STUFF(@URL, @Position, 6, @ReplaceWith),
        @Position = PATINDEX(@Pattern, @URL)

        PRINT 'High: ' + CAST(@High AS Varchar(256))
        PRINT '@ToReplace: ' + CAST(COALESCE(@ToReplace,'') AS nVARCHAR(256))
        PRINT 'With: ' + CAST(COALESCE(@ReplaceWith,'') AS VARCHAR(256))

        PRINT @URL
END
Robert
  • 724
  • 1
  • 11
  • 25