7

I have a table like below with columns A(int) and B(string):

A   B
1   a,b,c
2   d,e
3   f,g,h

I want to create an output like below:

A    B
1    a
1    b
1    c
2    d
2    e
3    f
3    g
3    h

If it helps, I am doing this in Amazon Athena (which is based on presto). I know that presto gives a function to split a string into an array. From presto docs:

split(string, delimiter) → array
Splits string on delimiter and returns an array.

Not sure how to proceed from here though.

ishan3243
  • 1,870
  • 4
  • 30
  • 49
  • Not an answer, but you should generally avoid storing CSV in your tables, for the very reason you are now seeing; it makes the data much harder to work with. – Tim Biegeleisen Jun 27 '18 at 13:15

1 Answers1

17

Use unnest on the array returned by split.

SELECT a,split_b 
FROM tbl
CROSS JOIN UNNEST(SPLIT(b,',')) AS t (split_b)
Vamsi Prabhala
  • 48,685
  • 4
  • 36
  • 58
  • how is it ensured that 'a' is joined only with the values stored against it? Doesn't seem evident from the query. Is unnest treating column-A as pivot? – ishan3243 Jun 27 '18 at 13:45
  • @ishan3243 see examples in the documentation https://prestodb.io/docs/current/sql/select.html#unnest – Piotr Findeisen Jun 28 '18 at 06:15