I writing a DataQualityOperator in a DAG. It should check if there's data in a Redshift table. To do this, I would like to check if the primary column contains null values. With sql, I found the name of the column Primary key. How do I check if it contains null values?(which means that the table is not good in my case).
class DataQualityOperator(BaseOperator):
check_template = """
SELECT a.attname
FROM pg_index i
JOIN pg_attribute a ON a.attrelid = i.indrelid
AND a.attnum = ANY(i.indkey)
WHERE i.indrelid = 'tablename'::regclass
AND i.indisprimary;
"""
def __init__ (self,redshift_conn_id = "", target_table="", *args, **kwargs):
super(...)
def execute(self, context):
self.log.info(f'DataQualityOperator processing {target_table}')
redshift = PostgresHook(postgres_conn_id=self.redshift_conn_id)
check_records = redshift.get_records(check_template.format(self.target_table))
How to achieve this? Thank you for the help.