Hive INSERT INTO vs UNION ALL performance

Asked Jul 27 '18 at 05:33

Active Jul 27 '18 at 05:33

Viewed 805 times

Lets say I have two tables A and B with same schema and 1 Million rows each, I can add values in table A and B into a table C using either a UNION ALL or two INSERT INTO statements.

I actually did it and found that the INSERT INTO did better, but I would like to know why? And is UNION ALL better in any specific scenario?

asked Jul 27 '18 at 05:33

Ani Menon

27,209
16
105
126

Try to execute it on Tez, it will run each query in parallel and will put files into subfolders. – leftjoin Jul 27 '18 at 07:01
@leftjoin tez isn't configured on my cluster, more over I am interested to know what is the internal difference between the two methods? – Ani Menon Jul 28 '18 at 14:46
And how can we know internal difference on your cluster without logs and query plan. It may be misconfiguration problem. Internal difference depends on configuration much. – leftjoin Jul 28 '18 at 18:56
1

@leftjoin how `inner join` or `union all` works in hive is defined in its internal code, not on my cluster. Don't comment unless you have something useful to add. – Ani Menon Jul 29 '18 at 08:59
How Union all works also depends on configuration. For example this: `hive.exec.parallel` Query plan end execution logs may show how it is being executed and what is the reason of slowdown. – leftjoin Jul 29 '18 at 09:15

Hive INSERT INTO vs UNION ALL performance

0 Answers0