Great Expectations is an open source software that helps teams promote analytic integrity by offering a unique approach to data pipeline testing. Pipeline tests are applied to data (instead of code) and at batch time (instead of compile or deploy time). Pipeline tests are like unit tests for datasets: they help you guard against upstream data changes and monitor data quality. In addition to pipeline testing GE also provides data documentation/profiling
Questions tagged [great-expectations]
131 questions
2
votes
2 answers
Passing AWS role to the application that uses default boto3 configs
I have an aws setup that requires me to assume role and get corresponding credentials in order to write to s3. For example, to write with aws cli, I need to use --profile readwrite flag. If I write code myself with boot, I'd assume role via sts, get…

Philipp_Kats
- 3,872
- 3
- 27
- 44
1
vote
1 answer
Using great expectations with databricks autolaoder
I have implemented a data pipeline using autoloader bronze --> silver --> gold.
now while I do this I want to perform some data quality checks, and for that I'm using great expectations library.
However I'm stuck with below error when trying to…

Chhaya Vishwakarma
- 1,407
- 9
- 44
- 72
1
vote
1 answer
Great expectations: UserConfigurableProfiler raises a MetricResolutionError: unhashable type: 'dict'
I am trying to use a profiler to create expectations on certain data batches.
import great_expectations as gx
from great_expectations.core.batch import BatchRequest
from great_expectations.profile.user_configurable_profiler import…

Imad
- 2,358
- 5
- 26
- 55
1
vote
1 answer
Getting error while installing the great exepectation tool in local
ERROR: Could not install packages due to an OSError: [Errno 2] No such file or directory:…

Dinesh___Reddy
- 36
- 5
1
vote
0 answers
how can I specify a different database and schema to create temporary tables in Great Expectations?
Great Expectations creates temporary tables. I tried profiling data in my Snowflake lab. It worked because the role I was using could create tables in the schema that contained the tables I was profiling.
I tried to profile a table in a Snowflake…

Alex Woolford
- 4,433
- 11
- 47
- 80
1
vote
1 answer
python great expectation compatible with pyspark
I am implementing data quality checks using Great expectation library. does this library compatible with Pyspark does this run on multiple cores?

code_bug
- 355
- 1
- 12
1
vote
0 answers
Great-Expectations: How to connect to data stored in S3
s3fs==2022.8.2
great-expectations==0.15.26
It was not easy to find a clear documentation and concrete examples for Great-Expectations. After several tries I succeeded to connect to the s3 bucket;
import great_expectations as ge
from…

Adil Blanco
- 616
- 2
- 6
- 23
1
vote
1 answer
How to open index.html file in databricks or browser?
I am trying to open index.html file through databricks. Can someone please let me know how to deal with it? I am trying to use GX with databricks and currently, data bricks store this file here:…

SeleniumUser
- 4,065
- 2
- 7
- 30
1
vote
1 answer
Creating an Expectation Suite With an Automated Profiler Great Expectation
I am a newbie to great expectations and trying to set up but facing the below issue while creating an expectation Suite with an Automated Profiler.
C:\Users\user\great_expectations>great_expectations --v3-api suite new
Using v3 (Batch Request)…

SeleniumUser
- 4,065
- 2
- 7
- 30
1
vote
1 answer
Great Expectation with Azure and Databricks
I want to run great_expectation test suites against csv files in my ADLS Gen2. On my ADLS, I have a container called "input" in which I have a file at input/GE/ind.csv. I use a InferredAssetAzureDataConnector. I was able to create and test/validate…

Vipin Dahake
- 11
- 2
1
vote
0 answers
Display whole rows in great_expectations dashboard
When an expectation fails, I cannot view on the dashboard (the data docs) the entire row (and not just the column value) which caused the failure. For example, if I have a failure because the maximum value of a numerical column is over a threshold,…

aprospero
- 529
- 3
- 14
1
vote
1 answer
Great Expectations Row Based Dimensions
I have data like this:
[ {
"name": "Apple",
"price": 1,
"type": "Food"
},
{
"name": "Apple",
"price": 0.90,
"type": "Food"
},
{
"name": "Apple",
"price": 1000,
…

steve76
- 302
- 2
- 9
1
vote
0 answers
Great Expectations: How to add a partition (column partition) in an Athena External Table in a checkpoint reference in GE?
The setup is GE v3 and I am using AWS Athena as a Data Source. However, I couldn't find a way to tell the "expectation" that the table in actually partitioned with a relative path in S3 like…

nandevers
- 191
- 8
1
vote
1 answer
Using great expectations for date validation
We are using great_expectations to validate data using Apache Spark.
We are unable to validate columns which have the DATE or DATETIME type.
We use the below configuration to check date entries in a table if they are recent or not.
[
…

Akhil Nambiar
- 315
- 3
- 18
1
vote
1 answer
How to integrate great expectations into airflow project
I m trying to integrate great expectations into a airflow project but without success.
My question is there a configuration to do ?
Here are the steps I followed:
1- I generate the great expectaions project by following this tutorial…

Adil Blanco
- 616
- 2
- 6
- 23