Teradata help in removing the multiple values from text

Question

Need help in achieving the output provided below.Using the below query but its not removing the duplicate values if there are multiple values coming :

select databasename, tablename,
CASE when LENGTH(commentstring) > 0 and  (upper(CommentString) like 'AB - %' or  
upper(CommentString) like  'AB-%' or  upper(CommentString) like  'ABA-%'
or  upper(CommentString) like  'AB_%') then 
(cast(regexp_substr(commentstring, '(\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z] 
{2,6}\b)', 1, 1, 'i') as varchar(128))        
|| coalesce(';' ||cast(regexp_substr(commentstring, '(\b[A-Za-z0-9._%+-]+@[A-Za-z0- 
9.-]+\.[A-Za-z]{2,6}\b)', 1, 2, 'i')as varchar(128)), '') 
|| coalesce(';' ||cast(regexp_substr(commentstring, '(\b[A-Za-z0-9._%+-]+@[A-Za-z0- 
9.-]+\.[A-Za-z]{2,6}\b)', 1, 3, 'i')as varchar(128)), '') 
|| coalesce(';' ||cast(regexp_substr(commentstring, '(\b[A-Za-z0-9._%+-]+@[A-Za-z0- 
9.-]+\.[A-Za-z]{2,6}\b)', 1, 4, 'i')as varchar(128)), '')  
|| coalesce(';' ||cast(regexp_substr(commentstring, '(\b[A-Za-z0-9._%+-]+@[A-Za-z0- 
9.-]+\.[A-Za-z]{2,6}\b)', 1, 5, 'i')as varchar(128)), '')) end as emailid
from tablesv

Input :

test1,sample1,abc@test.com;cde@test.com
test2,sample2,fgh@test.com;fgh@test.com,ghi@test.com
test3,sample3,hij@test.com;jkl@test.com;hij@test.com

Ouput: With the above query, not able to remove the duplicate email id.I need to keep one email if its duplicate in row.

test1,sample1,abc@test.com;cde@test.com
test2,sample2,fgh@test.com;ghi@test.com
test3,sample3,hij@test.com;jkl@test.com

Andrew · Answer 1 · 2023-07-13T18:42:20.253

Assuming you have some sort of key column, I would split that column using strtok_split_to_table, get the distinct values, and them put them back together. In your examples, you seem to have 2 delimiters, so I used oreplace to make them all a comma.

EDIT - (based on comment) - we'll derive a numeric key to make it easier, although you might be able to just concatenate databasename and tablename and use that. But the point is, strtok_split_to_table can only take one column for the "key".

  with cte as 
    (select
    top 1 databasename, 
tablename,
'test2,sample2,fgh@test.com,fgh@test.com,ghi@test.com' as commentstring,
row_number() over (order by databasename,tablename) as rn
    from
    dbc.tablesv
    order by databasename,tablename
    )
    
    select 
    cte.databasename,cte.tablename,
    TRIM(TRAILING ',' FROM (XMLAGG(parsed_val || ',' ORDER BY parsed_val) (VARCHAR(500))))
    from
    (
    select distinct
    somekey,
    parsed_val
    from
    table (strtok_split_to_table(cte.rn,oreplace(cte.commentstring,';',','),',')
    returns (somekey integer,token_num integer,parsed_Val varchar(100))) as t
    ) t1
    inner join cte
    on t1.somekey = cte.rn
    group by
    cte.databasename,cte.tablename

Hi Andrew,Databasename and tablename are primary keys.Sorry thats mistake one delimeter need to be there i.e ;.How to implement in the above code Andrew? — Sak, Jul 13 '23 at 18:06
Hi Andrew,Thank you for your help.I have tried running the query but its throwing error :Query execution failed Reason: If InKey is Decimal then OutKey should be DECIMAL and with same precision. — Sak, Jul 14 '23 at 09:18

dnoeth · Answer 2 · 2023-07-17T20:41:03.483

If you got a limited number of emails you can use this approach:

WITH cte AS
 ( SELECT ... -- other columns
      --split into emails (in a Derived Table/CTE) 
     ,coalesce(regexp_substr(commentstring, '(\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,6}\b)', 1, 1, 'i'), '') as sub1
     ,coalesce(regexp_substr(commentstring, '(\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,6}\b)', 1, 2, 'i'), '') as sub2
     ,coalesce(regexp_substr(commentstring, '(\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,6}\b)', 1, 3, 'i'), '') as sub3
     ,coalesce(regexp_substr(commentstring, '(\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,6}\b)', 1, 4, 'i'), '') as sub4
     ,coalesce(regexp_substr(commentstring, '(\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,6}\b)', 1, 5, 'i'), '') as sub5
   FROM tablesv
 )
SELECT ... -- other columns
-- and then use brute force to find the unique values 
  ,sub1
   || case when sub2 not in (sub1)                then sub2 else '' end
   || case when sub3 not in (sub1,sub2)           then sub3 else '' end
   || case when sub4 not in (sub1,sub2,sub3)      then sub4 else '' end
   || case when sub5 not in (sub1,sub2,sub3,sub4) then sub5 else '' end
FROM cte

Hi dnoeth,really sorry I am new to this when you mention in derived table/cte how to do it.Do i need to create different table for this.Is there way I can use this in select itself? — Sak, Jul 14 '23 at 15:38
Sorry for the late answer, both Derived Tables and Common Table Expressions are nested Selects. I changed my answer showing a Derived Table. — dnoeth, Jul 17 '23 at 20:42

Teradata help in removing the multiple values from text

2 Answers2