I currently am trying to use the Python Data validation package 'Great Expectations'.
I am currently using the GreatExpectationsOperator to call an expectation suite on a particular datasource (a PostgreSQL datasource).
my_ge_task = GreatExpectationsOperator(
task_id='my_task',
expectation_suite_name='suite.error',
batch_kwargs={
'table': 'data_quality',
'datasource': 'data_quality_datasource',
'query': "SELECT * FROM data_qualityWHERE batch='abc';"
},
data_context_root_dir=ge_root_dir
)
What I'm trying to figure out is how to store and get my datasource credentials. For other operations using PostgreSQL, I have been using a PostgreSQL connection to store the database credentials and using the PostgreSQL hook to interact with the database. However with great expectations, the postgreSQL connection details are stored inside the Great expectations context in the config_variables.yaml. I have tried using ENV variables when creating my dockerfile and using these as the credentials and it works but I am trying to find a cleaner way, by possibly using my existing PostgreSQL connection details to use for a datasource.
There doesnt seem to be much details online on how to accomplish this so any help would be very very much appreciated.
Thanks,