I am trying to use SQLAlchemy and Great Expectations for testing data quality of datasets stored in Snowflake DB. The dataset is called candidates
and the column of interest is called first_name
.
However, when I run
sql_dataset = great_expectations.dataset.SqlAlchemyDataset(table_name="candidates", engine=engine, schema=creds["schema"])
sql_dataset.expect_column_values_to_be_in_set("first_name", ['Gather', 'Male'])
I am getting:
ProgrammingError: (snowflake.connector.errors.ProgrammingError) 002003 (42S02): SQL compilation error:
Object 'KEBOOLA_274.WORKSPACE_48777448.CANDIDATES' does not exist or not authorized.
[SQL: SELECT count(*) AS element_count, sum(CASE WHEN (first_name IS NULL) THEN %(param_1)s ELSE %(param_2)s END) AS null_count, sum(CASE WHEN (first_name NOT IN (%(first_name_1)s, %(first_name_2)s) AND first_name IS NOT NULL) THEN %(param_3)s ELSE %(param_4)s END) AS unexpected_count
FROM "WORKSPACE_48777448".candidates]
[parameters: {'param_1': 1, 'param_2': 0, 'first_name_1': 'Gather', 'first_name_2': 'Male', 'param_3': 1, 'param_4': 0}]
(Background on this error at: http://sqlalche.me/e/13/f405)
The issue is the table name is not enclosed in double quotes, hence Snowflake is looking for CANDIDATES
instead of candidates
. The same issue would arise with the column name should the table be found.
I've tested
sql_dataset = great_expectations.dataset.SqlAlchemyDataset(table_name="\"candidates\"",
engine=engine,
schema=creds["schema"])
bu Snowflake is then looking for a table called KEBOOLA_274.WORKSPACE_48777448.""candidates""
.
I know SQLAlchemy by default considers lowercase object names to be case insensitive, therefore it doesn't enclose the names. Is there any way around it please?