18

I am trying to create a table (table 2) in Hive from another table (table 1). I am trying to exclude certain rows with NULL values and tried the following condition.

    insert overwrite table table2 partition (date = '2013-06-01')
    select column1, column 2....
    from table1
    where column1 is not NULL or column1 <> '';

However, when I try this following query with the new table I get 300+ rows with NULL vaues:

    select count(*) from table2 where column1 is NULL;

Could someone point to what is causing these NULL values?

Thank you.

Ravi

Ravi
  • 3,223
  • 7
  • 37
  • 49

6 Answers6

30

Firstly — I don't think column1 is not NULL or column1 <> '' makes very much sense. Maybe you meant to write column1 is not NULL and column1 <> '' (AND instead of OR)?

Secondly — because of Hive's "schema on read" approach to table definitions, invalid values will be converted to NULL when you read from them. So, for example, if table1.column1 is of type STRING and table2.column1 is of type INT, then I don't think that table1.column1 IS NOT NULL is enough to guarantee that table2.column1 IS NOT NULL. (I'm not sure about this, though.)

ruakh
  • 175,680
  • 26
  • 273
  • 307
8

Try to include length > 0 as well.

column1 is not NULL AND column1 <> '' AND length(column1) > 0 
zthomas.nc
  • 3,689
  • 8
  • 35
  • 49
ShikharDua
  • 9,411
  • 1
  • 26
  • 22
2

What is the datatype for column1 in your Hive table? Please note that if your column is STRING it won't be having a NULL value even though your external file does not have any data for that column.

1

Try using isnull(a), isnotnull(a), nvl(), etc. On some versions(potentially in conjunction with the server settings - atleast with the one I am working on) of hive the 'IS NULL' and 'IS NOT NULL' syntax does not execute the logic when it is compiled. Check here for more information.

0

To check for the NULL data for column1 and consider your datatype of it is String, you could use below command :

select * from tbl_name where column1 is null or column1 <> '';
Indrajeet Gour
  • 4,020
  • 5
  • 43
  • 70
0

I use below sql to exclude the null string and empty string lines.

select * from table where length(nvl(column1,0))>0

Because, the length of empty string is 0.

select length('');
+-----------+--+
| length()  |
+-----------+--+
| 0         |
+-----------+--+
camash
  • 3
  • 3
  • Above query discards only `''` values . If value is `null` then above query will return 0, for which length is 1. – sjd Apr 24 '20 at 16:01