0

Whenever I run function "collect_list" on Hive, it always throws an error:

Query ID = xxxxx
Total jobs = 1
Launching Job 1 out of 1
Failed to get session
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.tez.TezTask

Here's the example:

Data:

id    value
1       A
1       B
2       C
3       D

I run the query on hive's terminal and here's my query:

SELECT id, collect_list(value) FROM something GROUP BY id;

I want the result like this:

id    value
1       A, B
2       C
3       D

Do I need to configure something before using collect_list function? Thank you.

Deo
  • 1
  • 2

2 Answers2

0

you should be grouping by id

SELECT collect_list(value) FROM something group by id;

hlagos
  • 7,690
  • 3
  • 23
  • 41
0

Collect_list uses ArrayList, so the data will be kept in the same order they were added, to do that, you need to use SORT BY clause in a subquery, don't use ORDER BY, it will cause your query to execute in a non-distributed way.

SELECT id, COLLECT_LIST(value)
FROM (SELECT * FROM something SORT BY id, value DESC) x
GROUP BY id;
  • Thank you for your response. But, when I tried to use SORT BY, the error still the same. I don't know what is wrong in here – Deo Apr 21 '17 at 06:05