1

While trying to isolate the 7 rows in the first query below, I got results that, as a novice, I was not expecting. I read over this SQLite document but do not understand why the 7 rows which are NULL are separated in the GROUP BY in the first query, but testing on != '', not in ('A','H'), and ='' all exclude the NULL rows.

It seems as though the tests are exclusive, such that NULL is either ='' or !='', or in ('A','H') or not in ('A','H'). It appears to be ignored by all of these, yet is separated in the GROUP BY.

Would you please explain why it works this way?

Thank you.

sqlite> select substr(trim(grammarCode),1,1) as c, count(indexRow) as cnt
        from tbl
        group by c
        order by cnt;
c  cnt   
-  ------
   7     
A  4828  
   20046 
H  300679

sqlite> select substr(trim(grammarCode),1,1) as c, count(indexRow) as cnt
        from tbl
        where c != ''
        group by c
        order by cnt;
c  cnt   
-  ------
A  4828  
H  300679

sqlite> select substr(trim(grammarCode),1,1) as c, count(indexRow) as cnt
        from tbl
        where c not in ('A', 'H')
        group by c
        order by cnt;
c  cnt   
-  ------
   20046 

sqlite> select substr(trim(grammarCode),1,1) as c, count(indexRow) as cnt
        from tbl
        where c = ''
        group by c
        order by cnt;
c  cnt   
-  ------
   20046 

sqlite> select substr(trim(grammarCode),1,1) as c, count(indexRow) as cnt
        from tbl
        where c is null
        group by c
        order by cnt;
c  cnt   
-  ------
   7     

forpas
  • 160,666
  • 10
  • 38
  • 76
Gary
  • 2,393
  • 12
  • 31

1 Answers1

1

A boolean expression in SQL evaluates as true or false or null.

The WHERE clause of any SQL statement filters out all rows for which the boolean expression/condition is not true, meaning false and null are filtered out.

All these boolean expressions:

null != ''
null = ''
null not in ('A', 'H')

are evaluated as null (demo), because any comparison of/to null without the use of the operator IS returns null.

This is why your 2nd, 3d and 4th queries filter out not only the rows that don't satisfy the condition in the WHERE clause but also the rows where c is null.

If you want these rows where c is null you must explicitly mention that:

c != '' OR c IS NULL
c = '' OR c IS NULL
c not in ('A', 'H') OR c IS NULL

Or, for the first 2 cases you can use the operator IS:

c IS NOT ''
c IS ''
forpas
  • 160,666
  • 10
  • 38
  • 76
  • Thank you. May I ask a follow-up question please? Since this is data that I'm building and have direct control over, is it always in my best interest to ensure that the final tables never include NULL? The data above was the result of running a programmatic extract on textual files and then checking the results for reasonableness and "cleaning" errors. I don't know why 7 are null rather then empty/blank but I can change all null to blank or something appropriate to each column and write the table to one with contraints that do not permit null. – Gary Jan 15 '23 at 20:59
  • `null` is a valid value in SQL and indicates *unknown*: https://en.wikipedia.org/wiki/Null_(SQL) Depending on your requirement you may decide if a column may allow nulls or not. If yes then you should always check them with the operator IS. It may be confusing at the start but it all makes sense after a while. But empty strings are not *unknown* values and I would avoid them because they are meaningless in most cases. – forpas Jan 16 '23 at 08:51