1

I am doing posexplode, How to remove duplicated rows caused from parsing of the columns? I cant do distinct because there are null on several columns (before parsing).

example before posexplode:

id   | cofe         |
AAA  |  |||-9000| 4 |
BBB  |   5|90       |
CCC  |              |
DDD  |  6|||||      |
EEE  |              |

unfortunately, the results

id   | cofe
AAA  |  
AAA  |  
AAA  |  -9000
AAA  |    4
BBB  |    5
BBB  |   90
CCC  |   
DDD  |   6
DDD  |   
DDD  |   
DDD  |   
DDD  |   
EEE  |

expected results

id   | cofe
AAA  |  -9000
AAA  |    4
BBB  |    5
BBB  |   90
CCC  |   
DDD  |   6 
EEE  |
SELECT qq.id, 
ss.cofe,
ss.fnte,
ss.cnte
from
(
select id, 
sequence,  
split (BMWA, '~')[15] AS CFEEA
split (BMWA, '~')[16] AS FTAAA 
split (BMWA, '~')[17] AS CNTTA
FROM 
( 
select id,  
sequence,  
replace(bmw, '^','~') AS BMWA
from tablee
)rr 
)qq
lateral view posexplode(split(replace(qq.CFEEA'|','~'),'~')) ss as r, cofe,
lateral view posexplode(split(replace(qq.FTAAA'|','~'),'~')) ss as r, fnte
lateral view posexplode(split(replace(qq.CNTTA'|','~'),'~')) ss as r, cnte

any ideas will be appreciated!

leftjoin
  • 36,950
  • 8
  • 57
  • 116

1 Answers1

1

If you want to skip empty elements when splitting string, replace consecutive delimiters with single one before splitting, also remove delimiters in the beginning and in the end.

For example for '|||-9000| 4' (delimited by pipe)

select  split(
        regexp_replace(
        --replace consecutive 2+ delimiters with single one
        regexp_replace ('|||-9000| 4','\\|{2,}','|'), --gives '|-9000| 4'
        --remove start and end delimiter
        '^\\||\\|$',''),                              --gives '-9000| 4'
        --split
        '\\|')                                        --gives array ["-9000"," 4"]

Example with your data:

with mytable as (
select stack (5,
'AAA','|||-9000| 4',
'BBB',' 5|90',
'CCC','',
'DDD','6|||||',
'EEE',''
) as (id,cofe )
)

select id, e.val as cofe
  from mytable
       lateral view outer posexplode(
       split(
        regexp_replace(
        regexp_replace (cofe,'\\|{2,}','|'), 
        '^\\||\\|$',''),                          
        '\\|')
      ) e as pos, val

Result:

    id  cofe    
   AAA    -9000
   AAA    4
   BBB    5
   BBB    90
   CCC
   DDD    6
   EEE

Also multiple LATERAL VIEW posexplode can produce cartesian product of exploded values for each row. See this answer of how to explode multiple arrays of different length position-wise https://stackoverflow.com/a/56244552/2700344

leftjoin
  • 36,950
  • 8
  • 57
  • 116