1

Lets say I have two tables A and B with same schema and 1 Million rows each, I can add values in table A and B into a table C using either a UNION ALL or two INSERT INTO statements.

I actually did it and found that the INSERT INTO did better, but I would like to know why? And is UNION ALL better in any specific scenario?

Ani Menon
  • 27,209
  • 16
  • 105
  • 126
  • Try to execute it on Tez, it will run each query in parallel and will put files into subfolders. – leftjoin Jul 27 '18 at 07:01
  • @leftjoin tez isn't configured on my cluster, more over I am interested to know what is the internal difference between the two methods? – Ani Menon Jul 28 '18 at 14:46
  • And how can we know internal difference on your cluster without logs and query plan. It may be misconfiguration problem. Internal difference depends on configuration much. – leftjoin Jul 28 '18 at 18:56
  • 1
    @leftjoin how `inner join` or `union all` works in hive is defined in its internal code, not on my cluster. Don't comment unless you have something useful to add. – Ani Menon Jul 29 '18 at 08:59
  • How Union all works also depends on configuration. For example this: `hive.exec.parallel` Query plan end execution logs may show how it is being executed and what is the reason of slowdown. – leftjoin Jul 29 '18 at 09:15

0 Answers0