0

I'm having an issue where when great expectations builds a query string to a table_asset it doesn't use the schema name.

import great_expectations as gx
from sqlalchemy_extras.sqlalchemy_utils import get_credentials, get_connection_string

# this is a set of calls to our teams functions, don't worry too much about it
# the connection string will look like: 'redshift+psycopg2://USER:PASS@HOST:PORT/DB_NAME'
def get_gx_datasource(gx_context, db_name):
    settings = get_credentials().get(db_name)
    redshift_connection_string = str(get_connection_string(settings))
    return gx_context.sources.add_sql(connection_string=redshift_connection_string, name=db_name)


gx_context = gx.get_context()
expectation_suite = gx_context.add_expectation_suite(expectation_suite_name='my_suite')
gx_datasource = get_gx_datasource(gx_context, db_name='db_name)
gx_datasource.add_table_asset(
    name='bar',
    table_name='bar',
    schema_name='foo'
)

asset = gx_datasource.get_asset('bar')
asset.add_splitter_mod_integer(column_name='my_col', mod=10)
batch_request = asset.build_batch_request()
batches = gx_datasource.get_batch_list_from_batch_request(batch_request)

for batch in batches:
    print(batch.batch_spec)

The error I get is something like:

sqlalchemy.exc.ProgrammingError: (psycopg2.errors.UndefinedTable) relation "bar" does not exist

[SQL: SELECT distinct(mod(CAST(my_col AS INTEGER), %(mod_1)s)) AS distinct_1 
FROM bar]
[parameters: {'mod_1': 10}]

But while that query doesn't work when testing against my connection to redshift, the query does work if I change it to add the schema name like foo.bar.

But nothing I do seems to work.

Not this:

gx_datasource.add_table_asset(
    name='bar',
    table_name='foo.bar',
    schema_name='foo'
)

Or this:

gx_datasource.add_table_asset(
    name='bar',
    table_name='foo.bar'
)

And not directly editing the data in the table_asset object itself.

Am I missing something here?

Bill
  • 698
  • 1
  • 5
  • 22
  • 1
    It may not be what you are looking for but a Redshift session has a “search path” setting that can be used to set which schemas to search when one isn’t provided in the query. See: https://docs.aws.amazon.com/redshift/latest/dg/r_search_path.html – Bill Weiner May 20 '23 at 18:44
  • @BillWeiner Great Expectations doesn't seem to have a way to allow that. If the connection is persistent, I'm having trouble locating the sqlalchemy object for it. Is there a way to set something like `ALTER USER bill WITH DEFAULT_SCHEMA = foo;` at the server level? I don't want to edit the search path for all users just yet. – Bill May 22 '23 at 14:31
  • 1
    Yes you can alter a user to set their search_path - ALTER USER bill SET search_path TO ‘$user’, public, foo; but you will need to be a superuser to perform this. – Bill Weiner May 22 '23 at 20:30
  • Thanks for confirming! I had found something online about this and it seemed to work. – Bill May 22 '23 at 20:50
  • Feel free to write that up in the answers! – Bill May 22 '23 at 20:59

1 Answers1

0

It may not be what you are looking for but a Redshift session has a “search path” setting that can be used to set which schemas to search when one isn’t provided in the query. See: docs.aws.amazon.com/redshift/latest/dg/r_search_path.html

You can alter a user to set their default search_path such as:

ALTER USER bill SET search_path TO ‘$user’, public, foo;

but you will need to be a superuser to perform this.

Bill Weiner
  • 8,835
  • 2
  • 7
  • 18