In a table having 2 columns(my_col and ID) with many records, below are 2 sample records
Record 1 : Column 1 - my_col value as: {"XXX": ["123","456"],"YYY": ["246","135"]}
and Column 2 - ID as A123
Record 2 : Column 1 - my_col value as: {"ZZZ":["333"]}
and Column 2 - ID as B222
Need to flatten using pyspark:
Key | Value | ID |
---|---|---|
XXX | 123 | A123 |
XXX | 456 | A123 |
YYY | 246 | A123 |
YYY | 135 | A123 |
ZZZ | 333 | B222 |
Can you please help on this issue?
FYI... Achieved the result in Hive:
select split(element,':')[0] key,
e.value,
id
from
(
SELECT
regexp_replace(e.element,'^\\{|"| *\\[|\\]|\\}$','') element,
t.id
FROM
input_df t
lateral view explode(split(my_col,'(?<=\\]) *, *(?=\\")')) e as element
)s lateral view explode(split(split(element,':')[1],',')) e as value