1

I am trying to access the snowflake datasource using "great_expectations" library.

The following is what I tried so far:

from ruamel import yaml

import great_expectations as ge
from great_expectations.core.batch import BatchRequest, RuntimeBatchRequest

context = ge.get_context()



datasource_config = {
    "name": "my_snowflake_datasource",
    "class_name": "Datasource",
    "execution_engine": {
        "class_name": "SqlAlchemyExecutionEngine",
        "connection_string": "snowflake://myusername:mypass@myaccount/myDB/myschema?warehouse=mywh&role=myadmin",
    },
    "data_connectors": {
        "default_runtime_data_connector_name": {
            "class_name": "RuntimeDataConnector",
            "batch_identifiers": ["default_identifier_name"],
        },
        "default_inferred_data_connector_name": {
            "class_name": "InferredAssetSqlDataConnector",
            "include_schema_name": True,
        },
    },
}

print(context.test_yaml_config(yaml.dump(datasource_config)))

I initiated great_expectation before executing above code:

great_expectations init

but I am getting the error below:

great_expectations.exceptions.exceptions.DatasourceInitializationError: Cannot initialize datasource my_snowflake_datasource, error: 'NoneType' object has no attribute 'create_engine'

What am I doing wrong?

Anthon
  • 69,918
  • 32
  • 186
  • 246
cloud_hari
  • 147
  • 1
  • 8

1 Answers1

1

Your configuration seems to be ok, corresponding to the example here.

If you look at the traceback you should notice that the error propagates starting at the file great_expectations/execution_engine/sqlalchemy_execution_engine.py in your virtual environment.

The actual line where the error occurs is:

            self.engine = sa.create_engine(connection_string, **kwargs)

And if you search for that sa at the top of that file:

import sqlalchemy as sa

make_url = import_make_url()
except ImportError:
        sa = None

So sqlalchemy is not installed, which you don't get automatically in your environement if you install greate_expectiations. The thing to do is to install snowflake-sqlalchemy, since you want to use sqlalchemy's snowflake plugin (assumption based on your connection_string).

/your/virtualenv/bin/python -m pip install snowflake-sqlalchemy

After that you should no longer get an error, it looks like test_yaml_config is waiting for the connection to time out.

What worries me greatly is the documented use of a deprecated API of ruamel.yaml. The function ruamel.yaml.dump is going to be removed in the near future, and you should use the .dump() method of a ruamel.yaml.YAML() instance.

You should use the following code instead:

import sys
from ruamel.yaml import YAML

import great_expectations as ge
context = ge.get_context()

datasource_config = {
    "name": "my_snowflake_datasource",
    "class_name": "Datasource",
    "execution_engine": {
        "class_name": "SqlAlchemyExecutionEngine",
        "connection_string": "snowflake://myusername:mypass@myaccount/myDB/myschema?warehouse=mywh&role=myadmin",
    },
    "data_connectors": {
        "default_runtime_data_connector_name": {
            "class_name": "RuntimeDataConnector",
            "batch_identifiers": ["default_identifier_name"],
        },
        "default_inferred_data_connector_name": {
            "class_name": "InferredAssetSqlDataConnector",
            "include_schema_name": True,
        },
    },
}

yaml = YAML()

yaml.dump(datasource_config, sys.stdout, transform=context.test_yaml_config)

I'll make a PR for great-excpectations to update their documentation/use of ruamel.yaml.

Anthon
  • 69,918
  • 32
  • 186
  • 246