0

Need help in achieving the output provided below.Using the below query but its not removing the duplicate values if there are multiple values coming :

select databasename, tablename,
CASE when LENGTH(commentstring) > 0 and  (upper(CommentString) like 'AB - %' or  
upper(CommentString) like  'AB-%' or  upper(CommentString) like  'ABA-%'
or  upper(CommentString) like  'AB_%') then 
(cast(regexp_substr(commentstring, '(\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z] 
{2,6}\b)', 1, 1, 'i') as varchar(128))        
|| coalesce(';' ||cast(regexp_substr(commentstring, '(\b[A-Za-z0-9._%+-]+@[A-Za-z0- 
9.-]+\.[A-Za-z]{2,6}\b)', 1, 2, 'i')as varchar(128)), '') 
|| coalesce(';' ||cast(regexp_substr(commentstring, '(\b[A-Za-z0-9._%+-]+@[A-Za-z0- 
9.-]+\.[A-Za-z]{2,6}\b)', 1, 3, 'i')as varchar(128)), '') 
|| coalesce(';' ||cast(regexp_substr(commentstring, '(\b[A-Za-z0-9._%+-]+@[A-Za-z0- 
9.-]+\.[A-Za-z]{2,6}\b)', 1, 4, 'i')as varchar(128)), '')  
|| coalesce(';' ||cast(regexp_substr(commentstring, '(\b[A-Za-z0-9._%+-]+@[A-Za-z0- 
9.-]+\.[A-Za-z]{2,6}\b)', 1, 5, 'i')as varchar(128)), '')) end as emailid
from tablesv

Input :

test1,sample1,abc@test.com;cde@test.com
test2,sample2,fgh@test.com;fgh@test.com,ghi@test.com
test3,sample3,hij@test.com;jkl@test.com;hij@test.com

Ouput: With the above query, not able to remove the duplicate email id.I need to keep one email if its duplicate in row.

test1,sample1,abc@test.com;cde@test.com
test2,sample2,fgh@test.com;ghi@test.com
test3,sample3,hij@test.com;jkl@test.com
Sak
  • 61
  • 6

2 Answers2

0

Assuming you have some sort of key column, I would split that column using strtok_split_to_table, get the distinct values, and them put them back together. In your examples, you seem to have 2 delimiters, so I used oreplace to make them all a comma.

EDIT - (based on comment) - we'll derive a numeric key to make it easier, although you might be able to just concatenate databasename and tablename and use that. But the point is, strtok_split_to_table can only take one column for the "key".

  with cte as 
    (select
    top 1 databasename, 
tablename,
'test2,sample2,fgh@test.com,fgh@test.com,ghi@test.com' as commentstring,
row_number() over (order by databasename,tablename) as rn
    from
    dbc.tablesv
    order by databasename,tablename
    )
    
    select 
    cte.databasename,cte.tablename,
    TRIM(TRAILING ',' FROM (XMLAGG(parsed_val || ',' ORDER BY parsed_val) (VARCHAR(500))))
    from
    (
    select distinct
    somekey,
    parsed_val
    from
    table (strtok_split_to_table(cte.rn,oreplace(cte.commentstring,';',','),',')
    returns (somekey integer,token_num integer,parsed_Val varchar(100))) as t
    ) t1
    inner join cte
    on t1.somekey = cte.rn
    group by
    cte.databasename,cte.tablename
Andrew
  • 8,445
  • 3
  • 28
  • 46
  • Hi Andrew,Databasename and tablename are primary keys.Sorry thats mistake one delimeter need to be there i.e ;.How to implement in the above code Andrew? – Sak Jul 13 '23 at 18:06
  • Same concepts apply, but see edits. – Andrew Jul 13 '23 at 18:27
  • Hi Andrew,Thank you for your help.I have tried running the query but its throwing error :Query execution failed Reason: If InKey is Decimal then OutKey should be DECIMAL and with same precision. – Sak Jul 14 '23 at 09:18
0

If you got a limited number of emails you can use this approach:

WITH cte AS
 ( SELECT ... -- other columns
      --split into emails (in a Derived Table/CTE) 
     ,coalesce(regexp_substr(commentstring, '(\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,6}\b)', 1, 1, 'i'), '') as sub1
     ,coalesce(regexp_substr(commentstring, '(\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,6}\b)', 1, 2, 'i'), '') as sub2
     ,coalesce(regexp_substr(commentstring, '(\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,6}\b)', 1, 3, 'i'), '') as sub3
     ,coalesce(regexp_substr(commentstring, '(\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,6}\b)', 1, 4, 'i'), '') as sub4
     ,coalesce(regexp_substr(commentstring, '(\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,6}\b)', 1, 5, 'i'), '') as sub5
   FROM tablesv
 )
SELECT ... -- other columns
-- and then use brute force to find the unique values 
  ,sub1
   || case when sub2 not in (sub1)                then sub2 else '' end
   || case when sub3 not in (sub1,sub2)           then sub3 else '' end
   || case when sub4 not in (sub1,sub2,sub3)      then sub4 else '' end
   || case when sub5 not in (sub1,sub2,sub3,sub4) then sub5 else '' end
FROM cte
dnoeth
  • 59,503
  • 4
  • 39
  • 56
  • Hi dnoeth,really sorry I am new to this when you mention in derived table/cte how to do it.Do i need to create different table for this.Is there way I can use this in select itself? – Sak Jul 14 '23 at 15:38
  • Sorry for the late answer, both Derived Tables and Common Table Expressions are nested Selects. I changed my answer showing a Derived Table. – dnoeth Jul 17 '23 at 20:42