Looking for the most efficient way to check for nulls and have a desired output for a report. This is done in a Hadoop environment.
For example, Database contains:
FirstName | LastName | State |
---|---|---|
John | {null} | OH |
Jane | Smith | {null} |
Pikachu | Pokemon | HI |
The output from the query would be:
RuleNum | ColName | DefectCount | TableCount | DateofData |
---|---|---|---|---|
Rule001 | FirstName | 0 | 50 | 2023-01-01 |
Rule002 | LastName | 1 | 50 | 2023-01-01 |
Rule003 | State | 1 | 50 | 2023-01-01 |
The embarassingly clunky and inefficient query that that I use, which works- but it takes a LONG time to get results is(and we check thousands of columns)- and i'm unsure as to how to get the table count included efficiently:
SELECT
CAST('Rule001' AS VARCHAR(255)) AS RuleNum,
CAST('FirstName' AS VARCHAR(255)) AS ColName,
A.DefectCount AS DefectCount,
CAST(('date_add(current_date(),-1)) AS VARCHAR(255) AS DateofData
FROM
(SELECT COUNT(1) AS DefectCount
FROM DbName.TableName
WHERE FirstName IS NULL
UNION ALL
SELECT
CAST('Rule002' AS VARCHAR(255)) AS RuleNum,
CAST('LastName' AS VARCHAR(255)) AS ColName,
A.DefectCount AS DefectCount,
CAST(('date_add(current_date(),-1)) AS VARCHAR(255) AS DateofData
FROM
(SELECT COUNT(1) AS DefectCount
FROM DbName.TableName
WHERE LastName IS NULL
UNION ALL
SELECT
CAST('Rule003' AS VARCHAR(255)) AS RuleNum,
CAST('State' AS VARCHAR(255)) AS ColName,
A.DefectCount AS DefectCount,
CAST(('date_add(current_date(),-1)) AS VARCHAR(255) AS DateofData
FROM
(SELECT COUNT(1) AS DefectCount
FROM DbName.TableName
WHERE State IS NULL