2

I have data in below format in hive. table test(seq string, result string);

|seq  | result                                                                                                                                                                                                                                                                                                                                                 |
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|0001 | [{"offerId":"Default_XYZ","businessName":"Apple","businessGroup":"Default","businessIssue":"Default","interactionId":"-4930126168287369915","campaignID":"P-1","rank":"1"},{"offerId":"Default_NAV","businessName":"Orange","businessGroup":"Default","businessIssue":"Default","interactionId":"-7830126168223452134","campaignID":"P-1","rank":"2"}] |

Output should be like

|seq  | offerId     | businessName   | businsesGroup| businessIssue | interactionId        | campaignId | rank |
----------------------------------------------------------------------------------------------------------------
|0001 | Default_XYZ | Apple          | Default      | Default       | -4930126168287369915 | P-1        | 1    |
|0001 | Default_NAV | Orange         | Default      | Default       | -7830126168223452134 | P-1        | 2    |

I tried to convert string to Array of struct, but it didnt work with direct CAST.

Any help please?

[EDIT - Tried below query]

 select sequenceNumber, offerId, businessName, rank from (

 select sequenceNumber,
        collect_list(oid['offerId']) as offerid_list
       , collect_list(oid['businessName']) as businessName_list
        ,collect_list(oid['rank']) as rank_list
  from (
 select sequenceNumber,
        str_to_map(translate(offer_Id,'{}','')) as oid

        from test
        lateral view explode (split(translate(result, '[]"',''),"\\},")) oid as offer_id
    ) x
    group by sequenceNumber

      ) y lateral view explode(offerid_list) olist as offerId
      lateral view explode(businessName_list) olist as businessName
      lateral view explode(rank_list) rlist as rank
Naveen
  • 425
  • 12
  • 28
  • any queries that you have tried so far? – hlagos Mar 01 '18 at 05:58
  • Yes. Tried some, but not getting required results. Edited my question with that query. – Naveen Mar 01 '18 at 06:36
  • I just want make sure that I understand, but it looks like you have 2 string columns, one of those is a json. You should be able to use a json serde if you are able to add your seq as prat of the json. – hlagos Mar 01 '18 at 06:53

2 Answers2

2

Found one solution to my question:

select                                                   
seq, 
split(split(results,",")[0],':')[1] as offerId,
split(split(results,",")[1],':')[1] as businessName,
split(split(results,",")[2],':')[1] as businessGroup,
split(split(results,",")[3],':')[1] as businessIssue,
split(split(results,",")[4],':')[1] as interactionId,
split(split(results,",")[5],':')[1] as campignId
regexp_replace(split(split(results,",")[6],":")[1], "[\\]|}]", "") as  rank

from
(
  select seq,
     split(translate(result), '"\\[|]|\""',''), "},") as r
      from test  
) t1
LATERAL VIEW explode(r) rr AS results
Naveen
  • 425
  • 12
  • 28
1

You may try get_json_object function.

select seq, get_json_object(result,'$\[0].offerId') as offerId,
            get_json_object(result,'$\[0].businessName') as businessName,
            get_json_object(result,'$\[0].businsesGroup') as businsesGroup,
            get_json_object(result,'$\[0].businessIssue') as businessIssue,
            get_json_object(result,'$\[0].interactionId') as interactionId,
            get_json_object(result,'$\[0].campaignId') as campaignId,
            get_json_object(result,'$\[0].rank') as rank
    from t
    UNION ALL
select seq, get_json_object(result,'$\[1].offerId') as offerId,
            get_json_object(result,'$\[1].businessName') as businessName,
            get_json_object(result,'$\[1].businsesGroup') as businsesGroup,
            get_json_object(result,'$\[1].businessIssue') as businessIssue,
            get_json_object(result,'$\[1].interactionId') as interactionId,
            get_json_object(result,'$\[1].campaignId') as campaignId,
            get_json_object(result,'$\[1].rank') as rank
    from t
Kaushik Nayak
  • 30,772
  • 5
  • 32
  • 45
  • Thanks Kaushik. What if result array contians more than 2 elements? We cannot dynamically change the hive query. – Naveen Mar 01 '18 at 22:33
  • @Naveen : I am not sure if that could be done. But, as hlagos suggested, you may use a JSON serde for such purpose. Even your solution would not be perfoming so well for large datasets, due to multiple function calls. So, you would probably need a different approach. Wish you all the best. – Kaushik Nayak Mar 02 '18 at 02:19