7

I have a table like

name            string                                      
address         string                                      
timezone        string                                      
one_key_value   array<struct<key:string,value:array<string>>                    
two_key_value   array<struct<key:string,value:array<string>>

and want to convert it to

name            string                                      
address         string                                      
timezone        string                                      
one_key_value   map<string,array<string>>                       
two_key_value   map<string,array<string>>

using presto. There is lateral view inline but it doesn't really work in presto. How can I do this?

John Constantine
  • 411
  • 1
  • 6
  • 15

2 Answers2

3

Based on the provided information, you basically need two things:

  1. Upgrade to something more recent -- latest from Maven Central / Github, or a Starburst-supported (and free) distribution from https://www.starburstdata.com/starburst-presto-sql/ (disclaimer: I am from Starburst). 0.175 is really a bit out of date.
  2. Use map_from_entries(one_key_value) (docs: https://trino.io/docs/current/functions/map.html#map_from_entries)
Piotr Findeisen
  • 19,480
  • 2
  • 52
  • 82
  • I don't think an update is possible. Is there a way around without upgrading? – John Constantine Oct 29 '18 at 19:58
  • I would try backporting the code from https://github.com/prestodb/presto/blob/master/presto-main/src/main/java/com/facebook/presto/operator/scalar/MapFromEntriesFunction.java to your version (start with the oldest commit containing the function, `master`'s version will be harder to backport). You can build Presto from source or add this as a plugin. – Piotr Findeisen Oct 29 '18 at 20:34
  • The main issue I have is that map in presto is map(array(K), array(V)) -> map(K, V) , is there a way to do map(varchar, array(varchar)). – John Constantine Oct 30 '18 at 15:49
  • I don't have 0.175 lying around to test any potential solution to this. – Piotr Findeisen Oct 31 '18 at 08:04
  • Cool, do you know how I can split the column into array and array> – John Constantine Oct 31 '18 at 17:45
2

I have not tested thought, but below expression should help:

map( transform(one_key_value, e -> e.key), transform(one_key_value, e -> e.value))
map( transform(two_key_value, e -> e.key), transform(two_key_value, e -> e.value))

AS per Presto 0.175 docs:

map(array, array) → map Returns a map created using the given key/value arrays.

SELECT map(ARRAY[1,3], ARRAY[2,4]); -- {1 -> 2, 3 -> 4}

We can use array transform function to build the array of keys and values from input field ( array<struct<key:string,value:array<string>>)

transform(array, function) → ARRAY Returns an array that applies function to each element of array

skadya
  • 4,330
  • 19
  • 27
  • Awesome, Thanks a lot :) Do you have an idea about what needs to be done to get the exact result in hive? – John Constantine Nov 05 '18 at 04:08
  • @JohnConstantine , I am not sure if we have built-in function in hive to achieve this. Thought it should be possible via custom [hive custom UDF](https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF#LanguageManualUDF-CreatingCustomUDFs). – skadya Nov 05 '18 at 14:58