Comma delimited string to individual rows - Impala SQL

Question

Let's suppose we have a table:

Owner   | Pets
------------------------------
Jack    | "dog, cat, crocodile" 
Mary    | "bear, pig"

I want to get as a result:

Owner   | Pets
------------------------------
Jack    | "dog"
Jack    | "cat"
Jack    | "crocodile" 
Mary    | "bear"
Mary    | "pig"

I found some solutions to similar problems by googling, but Impala SQL does not offer any of these capabilities to apply the suggested solutions.

Any help would be greatly appreciated!

Can you add some details of what you tried and what the results were, so that people don't duplicate your failed efforts? — shoover, May 23 '16 at 23:01
What you want to do is "flatten" a nested data structure, and that nested thing violates the whole entity-relation concept. So it cannot be done with **standard** SQL; only with database-specific extensions, which may or may not exist, and will differ widely from database to database. Sorry, Impala was not designed for this kind of ugly ETL stuff; you are supposed to prepare the data with Pig or Hive or whatever, load it in a Hive table, then expose it to Impala with `REFRESH ` — Samson Scharfrichter, May 24 '16 at 08:28

score 5 · Answer 1 · answered Apr 05 '17 at 09:55

5

The following works in Impala:

split_part(string source, string delimiter, bigint n)

You can find the documentation here:

https://www.cloudera.com/documentation/enterprise/5-9-x/topics/impala_string_functions.html

answered Apr 05 '17 at 09:55

Timothée HENRY

14,294
21
96
136

3

This solution would only work for one token at a time. It doesn't cleanly flatten the values into individual rows without an explicit call to each index position. Technically this could be a workaround, but will be quite verbose and messy. – eperks Apr 28 '20 at 22:59

score 1 · Answer 2 · answered Sep 02 '20 at 14:07

You have to do this with Hive SQL.

So lets create the impala table

-- imapla sql
CREATE TABLE IF NOT EXISTS tmp.my_example (
  `Owner` VARCHAR,
  Pets VARCHAR
);

INSERT INTO tmp.my_example (`Owner`, `Pets`) VALUES
('Jack', 'dog, cat, crocodile'),
('Mary', 'bear, pig');

Then use hive sql to split the column to rows:

-- hive sql
select f.owner as owner, t1.pos as pos, t1.val AS pet
from tmp.my_example f
lateral view posexplode(split(pets,', ')) t1

If you save your results as a new table don't forget to run refresh new_table in impala so that your new table can be used

P.S. Hive is incredible slow for this small table but performs reasonable for real data

This is the solution i think. Fallow this Link https://www.edureka.co/community/30472/hadoop-hive-how-to-split-a-single-row-into-multiple-rows — Rajitha Fernando, Mar 18 '21 at 17:44

score 1 · Answer 3 · answered Jun 05 '22 at 12:47

Try this one in Impala:

    select t,split_part(t,',',RT) cc
    from (
    select '12,33,4' t
    union all 
    select '12,23'   t
    ) T1
    inner join(
    select row_number() over(partition by 1 order by 1 ) RT
    from test.any_table_name limit 10) T2
    where T2.RT<=length(t) - length(replace(t,',','')) +1
    order by t,RT

score 0 · Answer 4 · answered Aug 24 '23 at 14:21

I want to improve a bit the answer from @Lele Han so it's clearer how to adapt it to your use case:

SELECT Owner
    , split_part(Pets, '|', PetPos) AS Pet
FROM (
    SELECT Owner, Pets
        , length(Pets) - length(replace(Pets, ',', '')) + 1 AS PetNumber
    FROM your_table
) t1
INNER JOIN (
    SELECT row_number() over(partition by 1 order by 1) PetPos
    FROM your_table
    LIMIT 100  -- a reasonable limit of number of pets
) t2
WHERE t2.PetPos <= t1.PetNumber

Comma delimited string to individual rows - Impala SQL

4 Answers4

Linked