0

I have two tables, tbl_a and tbl_b, both formatted as ORC, and partitioned on the column dt. One table uses a partition format %Y%m%d%H%M whereas the other table uses %Y%m%d.

When I look at the execution plan of a simple SELECT + WHERE statement in Hive 1.2.1.2.4 (using TEZ 0.7.0.2.4), I only see the predicate filter set for tbl_a, but not the other tbl_b. This means, that in tbl_b a full table scan will be executed and not only a scan over the desired partition. Both queried partitions exist and contain data.

hive> EXPLAIN SELECT * FROM tbl_a WHERE dt='1001010600';
OK
Plan not optimized by CBO.

Stage-0
  Fetch Operator
     limit:-1
     Select Operator [SEL_2]
        outputColumnNames:["_col0","_col1","_col2","_col3"]
        Filter Operator [FIL_4]
           predicate:(dt = '1001010600') (type: boolean)
           TableScan [TS_0]
              alias:tbl_a

Time taken: 0.866 seconds, Fetched: 12 row(s)

hive> EXPLAIN SELECT * FROM tbl_b WHERE dt='161001';
OK
Plan not optimized by CBO.

Stage-0
  Fetch Operator
     limit:-1
     Select Operator [SEL_2]
        outputColumnNames:["_col0","_col1","_col2","_col3"]
        TableScan [TS_0]
           alias:tbl_b

Time taken: 0.904 seconds, Fetched: 10 row(s)

This behavior doesn't look very clear to me. Which criteria decides if a where condition is used as predicate filter or not?

  • It seems these plans are not full listings of what explain command returned. Have you removed some rows. There are no information about tables statistics. How many rows/partitions/size are there is unknown. Is it possible that second table contains the only partition? And why don't you use CBO? – leftjoin Dec 21 '16 at 15:56
  • This is the full output of the `explain` statement. both tables contain more than the queried partition; `tbl_a` contains approx 100 partitions (~5GB per partition devided by approx 20 files), `tabl_b` contains 10 partitions (~500MB per partition divided by 2 files). CBO is off because i want to understand/test the predicate behavior. – Christoph Körner Dec 21 '16 at 21:58
  • 1
    Have checked this. For simple queries without joins even if CBO is enabled is says: Plan not optimized by CBO. Also it says nothing about predicates and partition pruning, just table scan like in your example but it works fine, selects only partition specified in predicate – leftjoin Dec 22 '16 at 10:17
  • Thanks a lot for the feedback. I still find it confusing that in one case it is written as predicate filter in the execution plan and in the other it isnt. – Christoph Körner Dec 22 '16 at 12:46
  • Tez query plan optimization seems to be responsible for this. If the query engine is set to MR the explain statement looks like expected. – Christoph Körner Dec 22 '16 at 14:02

0 Answers0