-1

I am working with a field called codes that is a delimited list of values, separated by commas. Within each item there is a title ending in a colon and then a code number following the colon. I want a list of only the code numbers after each colon.

Example Value: name-form-na-stage0:3278648990379886572,rules-na-unwanted-sdfle2:6886328308933282817,us-disdg-order-stage1:1273671130817907765

Desired Output: 3278648990379886572,6886328308933282817,1273671130817907765

The title does always start with a letter and the end with a colon so I can see how REGEXP_REPLACE might work to replace any string between starting with a letter and ending with a colon with '' might work but I am not good at REGEXP_REPLACE patterns. Chat GPT is down fml.

Side note, if anyone knows of a good guide for understanding pattern notation for regular expressions it would be much appreciated!

I tried this and it is not working REGEXP_REPLACE(REPLACE(REPLACE(codes,':', ' '), ',', ' ') ,' [^0-9]+ ', ' ')

MJ B
  • 19
  • 4
  • Alternatively, you could [Create a user-defined function - Amazon Redshift](https://docs.aws.amazon.com/redshift/latest/dg/user-defined-functions.html) and write the logic in Python. – John Rotenstein Feb 02 '23 at 00:05

1 Answers1

0

This solution assumes a few things:

  • No colons anywhere else except immediately before the numbers
  • No number at the very start

At a high level, this query finds how many colons there are, splits the entire string into that many parts, and then only keeps the number up to the comma immediately after the number, and then aggregates the numbers into a comma-delimited list.

Assuming a table like this:

create temp table tbl_string (id int, strval varchar(1000));
insert into tbl_string
values
(1, 'name-form-na-stage0:3278648990379886572,rules-na-unwanted-sdfle2:6886328308933282817,us-disdg-order-stage1:1273671130817907765');



with recursive cte_num_of_delims AS (
    select max(regexp_count(strval, ':')) AS num_of_delims
    from tbl_string
), cte_nums(nums) AS (
    select 1 as nums
    union all
    select nums + 1
    from cte_nums
    where nums <= (select num_of_delims from cte_num_of_delims)
), cte_strings_nums_combined as (
    select id,
           strval,
           nums as index
    from cte_nums
    cross join tbl_string
), prefinal as (
    select *,
           split_part(strval, ':', index) as parsed_vals
    from cte_strings_nums_combined
    where parsed_vals != ''
    and index != 1
), final as (
    select *,
           case
               when charindex(',', parsed_vals) = 0
                   then parsed_vals
               else left(parsed_vals, charindex(',', parsed_vals) - 1)
               end as final_vals
    from prefinal
)
select listagg(final_vals, ',')
from final
dfundako
  • 8,022
  • 3
  • 18
  • 34
  • Ok, thank you for this. I was hoping to avoid parsing the list to get the desired result but perhaps I will have to. – MJ B Feb 01 '23 at 22:53