1

I writing a DataQualityOperator in a DAG. It should check if there's data in a Redshift table. To do this, I would like to check if the primary column contains null values. With sql, I found the name of the column Primary key. How do I check if it contains null values?(which means that the table is not good in my case).

class DataQualityOperator(BaseOperator):
        check_template = """
                    SELECT a.attname
                    FROM   pg_index i
                    JOIN   pg_attribute a ON a.attrelid = i.indrelid
                                        AND a.attnum = ANY(i.indkey)
                    WHERE  i.indrelid = 'tablename'::regclass
                    AND    i.indisprimary;
        """

        def __init__ (self,redshift_conn_id = "", target_table="", *args, **kwargs):
             super(...)  

        def execute(self, context):
            self.log.info(f'DataQualityOperator processing {target_table}')
            redshift = PostgresHook(postgres_conn_id=self.redshift_conn_id)

            check_records = redshift.get_records(check_template.format(self.target_table))


How to achieve this? Thank you for the help.

anthelix
  • 25
  • 4
  • Primary key cannot contain NULL values. Are you trying to check whether table is empty or not? – Igor Nikolaev Apr 25 '20 at 22:44
  • not only empty, I would like to know if the data is of good quality and for that I have to determine if there are null values in a column where I know there are none. – anthelix Apr 26 '20 at 17:16

1 Answers1

0

Are you looking for the SQL to check for NULLs in a column? If so

select count(1) from <table> where <column> is NULL; 

If other, please clarify.

Bill Weiner
  • 8,835
  • 2
  • 7
  • 18
  • If you think that `count(1)` is faster than `count(*)` then you are wrong. In fact it's slower –  Apr 25 '20 at 22:38
  • I do `SELECT COUNT(*) FROM (SELECT att.attname ...) AS foo IS NULL` but it's return the name of the primary column. I would know if null value within. If null value, my DAG do not go well. – anthelix Apr 26 '20 at 17:22