1

I have the following information

id     user                 date                          test

102   123~456~897   01JAN2019~6JUL2018~21DEC2017        abc~qwer~~wer~
103  13~45~88~34   01JAN2020~6JUN2019~21JAN2018~12MAR2017 pqr~~qw~wer*123~qwerty

The result I need, the records must be expanded based on ascending date.

id      user     date            test 

102     897      21DEC2017        wer
102     456      06JUL2018        qwer
102     123      01JAN2019        abc
103     34       12MAR2017        qwerty
103     88       21JAN2018        wer*123
103     45       06JUN2019        qw
103     13       01JAN2020        pqr

I tried with lateral View posexplode, but it's not working properly, please help.

a.id, 
    from demo a
    lateral view posexplode(a.test,'\\~')) t1 as exploded_test,test
    lateral view posexplode(a.user,'\\~')) t2 as exploded_user, user
    lateral view posexplode(a.date,'\\~')) t3 as exploded_date, date
    where exploded_test == exploded_user and exploded_user == exploded_date
    order by 1,3
mck
  • 40,932
  • 13
  • 35
  • 50
shanmukh
  • 57
  • 5
  • 1
    Can you format your data to understand problem? what I understood is you've 3 input rows , if possible can you break down your input rows to columns based on your delimiter and then format your desired output ? – Naga Nov 07 '19 at 17:21
  • Did my best to format, can some one give me tips on how to format properly, thanks – shanmukh Nov 07 '19 at 18:44

1 Answers1

1

Tilde (~) is not a metacharacter in Java regex (Hive uses Java regex) and does not need escaping. Also posexplode or explode can be applied to array, you need to split string to get an array before exploding. Use posexplode(split(a.user,'~+')) t1 as (pos,user)

Also (this is just an opinion): I would not name the array position as "exploded_test". pos or position or simply p would look less confusing

leftjoin
  • 36,950
  • 8
  • 57
  • 116
  • thanks , but in my question above , there are multiple ~ separating the values, I want to ignore multiple occurances of ~ and only consider single occurance of ~. Is it possible to achieve that with posexplode – shanmukh Nov 07 '19 at 21:58
  • @shanmukh According to the desired result in the question, I believe you are trying to consider multiple tildes as the single one and split by it, right? Then use `'~+'` in the regexp, it means: one or more tilde. Edited the answer – leftjoin Nov 08 '19 at 04:45