2

I have a HiveQL query which looks like below:

create table JOINED as select TABLEA.* from TABLEA join TABLEB on
TABLEA.key=TABLEB.key where nvl(TABLEA.attr, 0)=nvl(TABLEB.attr, 0);

But this query doesn't pick those rows where TABLEA.key=TABLEB.key and

  1. TABLEA.attr=NULL and TABLEB.attr=NULL. (OR)
  2. TABLEA.attr=0 and TABLEB.attr=NULL. (OR)
  3. TABLEA.attr=NULL and TABLEB.attr=0.

None of the above cases are picked. Why could this happen? Have I misunderstood the use of NVL()?

I expect the attr attribute to default to 0 if it is NULL. What would be the right query?

David דודו Markovitz
  • 42,900
  • 6
  • 64
  • 88
Anusha
  • 21
  • 3

1 Answers1

0

Thanks, I have just reported a bug -
Incorrect results for INNER JOIN ON clause / WHERE involving NVL / COALESCE

If you'll check the execution plan you'll see that for both tables we got the wrong predicate attr is not null.
Selecting columns from both tables (e.g. select TABLEA.*,TABLEB.key) seems to prevent the issue.

explain
select TABLEA.* from TABLEA join TABLEB on
TABLEA.key=TABLEB.key where nvl(TABLEA.attr, 0)=nvl(TABLEB.attr, 0);

STAGE DEPENDENCIES:
  Stage-4 is a root stage
  Stage-3 depends on stages: Stage-4
  Stage-0 depends on stages: Stage-3

STAGE PLANS:
  Stage: Stage-4
    Map Reduce Local Work
      Alias -> Map Local Tables:
        $hdt$_0:tablea 
          Fetch Operator
            limit: -1
      Alias -> Map Local Operator Tree:
        $hdt$_0:tablea 
          TableScan
            alias: tablea
            Statistics: Num rows: 1 Data size: 14 Basic stats: COMPLETE Column stats: NONE
            Filter Operator
              predicate: (key is not null and attr is not null) (type: boolean)
              Statistics: Num rows: 1 Data size: 14 Basic stats: COMPLETE Column stats: NONE
              Select Operator
                expressions: key (type: int), attr (type: int)
                outputColumnNames: _col0, _col1
                Statistics: Num rows: 1 Data size: 14 Basic stats: COMPLETE Column stats: NONE
                HashTable Sink Operator
                  keys:
                    0 _col0 (type: int), NVL(_col1,0) (type: int)
                    1 _col0 (type: int), NVL(_col1,0) (type: int)

  Stage: Stage-3
    Map Reduce
      Map Operator Tree:
          TableScan
            alias: tableb
            Statistics: Num rows: 1 Data size: 14 Basic stats: COMPLETE Column stats: NONE
            Filter Operator
              predicate: (key is not null and attr is not null) (type: boolean)
              Statistics: Num rows: 1 Data size: 14 Basic stats: COMPLETE Column stats: NONE
              Select Operator
                expressions: key (type: int), attr (type: int)
                outputColumnNames: _col0, _col1
                Statistics: Num rows: 1 Data size: 14 Basic stats: COMPLETE Column stats: NONE
                Map Join Operator
                  condition map:
                       Inner Join 0 to 1
                  keys:
                    0 _col0 (type: int), NVL(_col1,0) (type: int)
                    1 _col0 (type: int), NVL(_col1,0) (type: int)
                  outputColumnNames: _col0, _col1
                  Statistics: Num rows: 1 Data size: 15 Basic stats: COMPLETE Column stats: NONE
                  File Output Operator
                    compressed: false
                    Statistics: Num rows: 1 Data size: 15 Basic stats: COMPLETE Column stats: NONE
                    table:
                        input format: org.apache.hadoop.mapred.SequenceFileInputFormat
                        output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
                        serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
      Local Work:
        Map Reduce Local Work

  Stage: Stage-0
    Fetch Operator
      limit: -1
      Processor Tree:
        ListSink
David דודו Markovitz
  • 42,900
  • 6
  • 64
  • 88
  • Thank you for the response. However, the temporary solution of selecting columns from both tables doesn't seem to prevent the issue for me. Anyway, thanks a lot! – Anusha May 09 '17 at 04:36