Hive explain plan where to see full table scan?

Question

How can I see from hive EXPLAIN is there a full table scan?

For example, is there a full scan? The table size is 993 rows.

The query is

explain select latitude,longitude FROM CRIMES WHERE geohash='dp3twhjuyutr'

I have secondary index on geohash column.

STAGE PLANS:
      Stage: Stage-1
        Map Reduce
          Map Operator Tree:
              TableScan
                alias: crimes
                filterExpr: (geohash = 'dp3twhjuyutr') (type: boolean)
                Statistics: Num rows: 993 Data size: 265582 Basic stats: COMPLETE Column stats: NONE
                Filter Operator
                  predicate: (geohash = 'dp3twhjuyutr') (type: boolean)
                  Statistics: Num rows: 496 Data size: 132657 Basic stats: COMPLETE Column stats: NONE
                  Select Operator
                    expressions: latitude (type: double), longitude (type: double)
                    outputColumnNames: _col0, _col1
                    Statistics: Num rows: 496 Data size: 132657 Basic stats: COMPLETE Column stats: NONE
                    File Output Operator
                      compressed: false
                      Statistics: Num rows: 496 Data size: 132657 Basic stats: COMPLETE Column stats: NONE
                      table:
                          input format: org.apache.hadoop.mapred.SequenceFileInputFormat
                          output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
                          serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe

  Stage: Stage-0
    Fetch Operator
      limit: -1
      Processor Tree:
        ListSink

leftjoin · Accepted Answer · 2019-05-25T11:44:10.797

3

Absence of partition predicate in the plan means full scan. Of course this is not about predicate push-down in ORC.
Check Data size and Num rows in each operator.
EXPLAIN DEPENDENCY command will show all input_partitions collection and you can check what exactly will be scanned.

edited May 25 '19 at 11:44

answered May 24 '19 at 17:10

leftjoin

36,950
8
57
116

Please, look at the edit pf the question. Thank you – Markiza May 30 '19 at 08:24
@Markiza Your index is not used in the plan. Also the table is too small and using Hive index is unnecessary overhead. Also Hive Indexes is a bit outdated concept. Do not use Hive Indexes, use ORC and partitions if applicable. ORC has built-in indexes and you do not need to manage them, it is done automatically in case of ORC. Currently you are using Sequence file format. Read this: https://community.hortonworks.com/articles/68631/optimizing-hive-queries-for-orc-formatted-tables.html – leftjoin May 30 '19 at 09:07
Thank you, my goal was just to to try index in work and check the execution plan, so in my plan it's a full scan, right? Yes, I know, indexes are removed in 3rd Hive. – Markiza May 30 '19 at 12:00
@Markiza Right. Yes it is. – leftjoin May 30 '19 at 12:09

Hive explain plan where to see full table scan?

1 Answers1