1

i need help on parsing strings which are separated by pipe. the first number is the key, and the subsequent string after pipe are values for that key in hundreds

|01|00109394|05|84|08|34353637

this should be broken into keys which are the length=2 number after the first pipe values: after each key

  • first_key = 01, values = 00, 10, 93, 94
  • second_key = 05, values = 84
  • so on...

the script should break the string and explode to rows so that the column "key" will have all the keys, the column "values" will have all the all the values

sample output

KEY Value
01  00
01  10
01  93
01  94
05  84
08  34
08  35

Here is the code i wrote but not working

Select my_key, explode(str_to_map(my_key,'[|]','[|]')) as (Key, Value)
from test_table;

Please suggest how to parse this string

leftjoin
  • 36,950
  • 8
  • 57
  • 116
Samy
  • 47
  • 7
  • 3
    What DBMS is this for and what does "not working" mean in detail? – sticky bit Dec 19 '19 at 19:09
  • 1
    Also consider fixing that schema and storing the data in a normalized way from the start. There should be a table with one column for the key, another for the value and maybe one for an ordinal if order matters here. Per value there should be one row in that table. – sticky bit Dec 19 '19 at 19:10
  • The query will fetch the data from hive. I cannot fix the schema as it is the data collected as logs and must be kept like that. It is not working because it splitting the string by pipe transform the entire string as "values". I need it to split in a way that it reads the "key" and then subsequent values as "values" to the key. then it should read the second key and put in the next row. – Samy Dec 19 '19 at 19:52
  • I don't know how can i fetch the keys from the string when both keys and values are separated by same pipe delimiter "|" – Samy Dec 19 '19 at 19:53
  • Which [DBMS](https://en.wikipedia.org/wiki/DBMS) product are you using? "SQL" is just a query language, not the name of a specific database product. [Why should I tag my DBMS](https://meta.stackoverflow.com/questions/388759/why-should-i-tag-my-rdbms) –  Dec 19 '19 at 21:48

1 Answers1

0

I did it using split and explode two times. First time to get key and value not splitted and second time to separate values by two characters. Read comments in the code:

with your_data as (
select stack (1, '|01|00109394|05|84|08|34353637') as str
)

select --s.initial_str, 
       s.key, v.val
from
(
select s.pos, s.initial_str, s.key, s.val
from
(
select s.initial_str,
       s.pos, --0 and even positions are keys, odd are values
       s.val as key, 
       --assign the value to each key, which is the next eploded value 
       lead(val,1) over(partition by s.initial_str order by s.pos) as val --some keys from main table are in partition by clause
from       
( --explode string
select d.str initial_str, s.pos, s.val
  from your_data d lateral view outer posexplode(split(regexp_replace(str,'(^\\|)|(\\|$)',''),'\\|')) s as pos, val --remove leading and trailing pipe and explode
) s
)s
where s.pos%2=0 --filter keys with assigned values only (0 and even are rows we need)
) s 
--explode each two chars
--(?<=\\G.{2}) matches an empty string that has the last match (\\G) followed by two characters (.{2}) before it (?<=)
lateral view outer explode(split(s.val,'(?<=\\G.{2})')) v as val 
where v.val!=''
;

Result:

01      00
01      10
01      93
01      94
05      84
08      34
08      35
08      36
08      37
leftjoin
  • 36,950
  • 8
  • 57
  • 116