0

I am trying to use SQLAlchemy and Great Expectations for testing data quality of datasets stored in Snowflake DB. The dataset is called candidates and the column of interest is called first_name.

However, when I run

sql_dataset = great_expectations.dataset.SqlAlchemyDataset(table_name="candidates", engine=engine, schema=creds["schema"])
sql_dataset.expect_column_values_to_be_in_set("first_name", ['Gather', 'Male'])

I am getting:

ProgrammingError: (snowflake.connector.errors.ProgrammingError) 002003 (42S02): SQL compilation error:
Object 'KEBOOLA_274.WORKSPACE_48777448.CANDIDATES' does not exist or not authorized.
[SQL: SELECT count(*) AS element_count, sum(CASE WHEN (first_name IS NULL) THEN %(param_1)s ELSE %(param_2)s END) AS null_count, sum(CASE WHEN (first_name NOT IN (%(first_name_1)s, %(first_name_2)s) AND first_name IS NOT NULL) THEN %(param_3)s ELSE %(param_4)s END) AS unexpected_count 
FROM "WORKSPACE_48777448".candidates]
[parameters: {'param_1': 1, 'param_2': 0, 'first_name_1': 'Gather', 'first_name_2': 'Male', 'param_3': 1, 'param_4': 0}]
(Background on this error at: http://sqlalche.me/e/13/f405)

The issue is the table name is not enclosed in double quotes, hence Snowflake is looking for CANDIDATES instead of candidates. The same issue would arise with the column name should the table be found.

I've tested

sql_dataset = great_expectations.dataset.SqlAlchemyDataset(table_name="\"candidates\"",
                                                           engine=engine,
                                                           schema=creds["schema"])

bu Snowflake is then looking for a table called KEBOOLA_274.WORKSPACE_48777448.""candidates"".

I know SQLAlchemy by default considers lowercase object names to be case insensitive, therefore it doesn't enclose the names. Is there any way around it please?

RadRuss
  • 484
  • 2
  • 6
  • 16
  • When you created the `candidates` table did you create it with `"` around it? If you didn't then there shouldn't be any issues. The only thing that is being enclosed in quotes based on your logs is the schema ...`FROM "WORKSPACE_48777448".candidates`. So, if when you created your schema, you enclosed it in `"` AND it was lowercase then you'll need to lowercase the schema in your config file. – Simon D Sep 11 '20 at 11:16
  • I created it with a lowercase name - the actual name of the table is `candidates`, hence I need to send a query that goes like `select * from "candidates"`. – RadRuss Sep 11 '20 at 15:19
  • Rename your table in Snowflake to CANDIDATES or just without the quotes, and you're all set. It's not a good practice to wrap table names in quotes in Snowflake for this very reason. – Mike Walton Sep 13 '20 at 00:13
  • Unfortunately this is not an option in my use case, I am working on an extension of an existing software and adjusting this would require significant changes. And even if I managed to do this, the same problem would arise with the column name (the query would be parsed as `select col_name from TABLE_NAME` and snowflake would look for the uppercased version of the column name). – RadRuss Sep 14 '20 at 06:15

2 Answers2

0

As Mike Walton has suggested in the comments, the solution was to uppercase all table names before using the great_expectations package.

RadRuss
  • 484
  • 2
  • 6
  • 16
0

I work for Superconductive as a developer of Great Expectations. I submitted a fix for this a few weeks back! You can now specify the use_quoted_name property in your batch_kwargs. If this is set to True, it will treat your table and column_names as case sensitive, so you should have no trouble accessing lowercase table names, though you will also need to make sure that the case of your other table and column names are appropriately specified.

Tal Gluck
  • 16
  • 2