Replacing a string over multiple rows

Question

I have a table like this one:

ID	Code	CodeType	CodeExt
1	-S_P&S_A	S_P	A+B
1	-S_P&S_A	S_A	C
2	S_P/S_A/S_Z	S_P	A+B+C
2	S_P/S_A/S_Z	S_A	D
2	S_P/S_A/S_Z	S_Z	E
3	S_P	S_P	A

The expected output is

ID	Code_new
1	-(A+B)&(C)
2	(A+B+C)/(D)/(E)
3	(A)

So I want the column Code to be replaced by CodeExt depending on CodeType.

What I tried is

SELECT ID, REPLACE(Code,CodeType,'('+CodeExt+')') AS Code_new
FROM table

but this doesn't deliver the expected output.

SQL Server 2019... Didn't knew there was another platform for T-SQL... Sorry about that — jigga, May 26 '22 at 19:59

Stu · Accepted Answer · 2022-05-26T21:00:08.100

Interesting problem. I think I have a working solution you can try. You might need to augment slightly if you have additional separators not shown in your sample data or probably more complex examples but it should give you something you can build on.

First split the code into separate rows on the delimiter (/ or &) by converting to a json array (this provides a sequence for ordering).

Then replace the code with its new value and re-aggregate, adding back the separator for all but the last row.

with n as (
  select ID, Value, seq, Replace(Value,CodeType,Concat('(',CodeExt,')')) NewValue,
    case when seq < Max(seq) over(partition by ID) then
      case when Code like '%&%' then '&' when Code like '%/%' then '/' else '' end
    end sep
  from t
  cross apply (
    select Value, [Key] seq 
    from OpenJson(Concat('["',Replace(Translate(code,'&/', ',,'),',','","'),'"]'))
  )s
)
select ID, String_Agg(Concat(NewValue, sep),'') within group(order by seq) Code_New
from n 
where Value != NewValue
group by ID;

Result:

See Working Fiddle Demo

score 4 · Answer 2 · answered May 26 '22 at 23:21

Another option is to use a recursive CTE.

The way this works is to number each row per Code, then take the first row and do the replacement, then recursively take all of them until there are none left.

This would be more efficient if each row of replacements is numbered (partitioned by Code), as then you can avoid running the row-numbers each time.

WITH numbered AS (
    SELECT *,
      rn = ROW_NUMBER() OVER (PARTITION BY t.Code ORDER BY t.CodeType),
      IsLast = CASE WHEN LEAD(t.CodeType) OVER (PARTITION BY t.Code ORDER BY t.CodeType) IS NULL THEN 1 END
    FROM t
),
cte AS (
    SELECT
      n.ID,
      n.IsLast,
      Code_new = REPLACE(n.Code, n.CodeType, '(' + n.CodeExt + ')'),
      Level = 1
    FROM numbered n
    WHERE n.rn = 1
    
    UNION ALL
    
    SELECT
      cte.ID,
      n.IsLast,
      REPLACE(cte.Code_new, n.CodeType, '(' + n.CodeExt + ')'),
      cte.Level + 1
    FROM cte
    JOIN numbered n ON n.ID = cte.ID AND n.rn = cte.Level + 1
)
SELECT
  cte.ID,
  cte.Code_new
FROM cte
WHERE cte.IsLast = 1;

db<>fiddle

Thank you so much for the solution, it works! But I have 1.8 billion rows and the solution on Stu runs faster. That's why I accepted his answer — jigga, May 30 '22 at 09:10
Yes I would expect so, recursive CTEs are not very efficient, but they can be simpler to write. 1.8bln rows you have to use the most efficient — Charlieface, May 30 '22 at 09:32

Replacing a string over multiple rows

2 Answers2