2

I have a table like this one:

ID Code CodeType CodeExt
1 -S_P&S_A S_P A+B
1 -S_P&S_A S_A C
2 S_P/S_A/S_Z S_P A+B+C
2 S_P/S_A/S_Z S_A D
2 S_P/S_A/S_Z S_Z E
3 S_P S_P A

The expected output is

ID Code_new
1 -(A+B)&(C)
2 (A+B+C)/(D)/(E)
3 (A)

So I want the column Code to be replaced by CodeExt depending on CodeType.

What I tried is

SELECT ID, REPLACE(Code,CodeType,'('+CodeExt+')') AS Code_new
FROM table

but this doesn't deliver the expected output.

Anonymous
  • 835
  • 1
  • 5
  • 21
jigga
  • 558
  • 4
  • 16

2 Answers2

4

Interesting problem. I think I have a working solution you can try. You might need to augment slightly if you have additional separators not shown in your sample data or probably more complex examples but it should give you something you can build on.

First split the code into separate rows on the delimiter (/ or &) by converting to a json array (this provides a sequence for ordering).

Then replace the code with its new value and re-aggregate, adding back the separator for all but the last row.

with n as (
  select ID, Value, seq, Replace(Value,CodeType,Concat('(',CodeExt,')')) NewValue,
    case when seq < Max(seq) over(partition by ID) then
      case when Code like '%&%' then '&' when Code like '%/%' then '/' else '' end
    end sep
  from t
  cross apply (
    select Value, [Key] seq 
    from OpenJson(Concat('["',Replace(Translate(code,'&/', ',,'),',','","'),'"]'))
  )s
)
select ID, String_Agg(Concat(NewValue, sep),'') within group(order by seq) Code_New
from n 
where Value != NewValue
group by ID;

Result:

enter image description here

See Working Fiddle Demo

Stu
  • 30,392
  • 6
  • 14
  • 33
4

Another option is to use a recursive CTE.

The way this works is to number each row per Code, then take the first row and do the replacement, then recursively take all of them until there are none left.

This would be more efficient if each row of replacements is numbered (partitioned by Code), as then you can avoid running the row-numbers each time.

WITH numbered AS (
    SELECT *,
      rn = ROW_NUMBER() OVER (PARTITION BY t.Code ORDER BY t.CodeType),
      IsLast = CASE WHEN LEAD(t.CodeType) OVER (PARTITION BY t.Code ORDER BY t.CodeType) IS NULL THEN 1 END
    FROM t
),
cte AS (
    SELECT
      n.ID,
      n.IsLast,
      Code_new = REPLACE(n.Code, n.CodeType, '(' + n.CodeExt + ')'),
      Level = 1
    FROM numbered n
    WHERE n.rn = 1
    
    UNION ALL
    
    SELECT
      cte.ID,
      n.IsLast,
      REPLACE(cte.Code_new, n.CodeType, '(' + n.CodeExt + ')'),
      cte.Level + 1
    FROM cte
    JOIN numbered n ON n.ID = cte.ID AND n.rn = cte.Level + 1
)
SELECT
  cte.ID,
  cte.Code_new
FROM cte
WHERE cte.IsLast = 1;

db<>fiddle

Charlieface
  • 52,284
  • 6
  • 19
  • 43
  • Thank you so much for the solution, it works! But I have 1.8 billion rows and the solution on Stu runs faster. That's why I accepted his answer – jigga May 30 '22 at 09:10
  • Yes I would expect so, recursive CTEs are not very efficient, but they can be simpler to write. 1.8bln rows you have to use the most efficient – Charlieface May 30 '22 at 09:32