Great Expectations is an open source software that helps teams promote analytic integrity by offering a unique approach to data pipeline testing. Pipeline tests are applied to data (instead of code) and at batch time (instead of compile or deploy time). Pipeline tests are like unit tests for datasets: they help you guard against upstream data changes and monitor data quality. In addition to pipeline testing GE also provides data documentation/profiling
Questions tagged [great-expectations]
131 questions
0
votes
0 answers
Great Expectations - Result validation for row_count and column_freshness
I would like to validate results for row count and column freshness on some data on AWS. I am using a check_config.json file to configure the checks. I use terraform to make a Glue job to run the check and throw the result to DynamoDB. The result in…

Fortune Musara
- 13
- 3
0
votes
0 answers
python Great Expectations memory error for unique check
I am implementing Data quality checks using Great Expectation library.The dataset size 80GB and the number of rows 513749893.
Following is the code which i am implementing to find out unique checks on one of the column,
import great_expectations as…

code_bug
- 355
- 1
- 12
0
votes
0 answers
Custom great expectations rule
requirements :
to create custom column map and column pair expectations
use the custom expectations with great_expectation library
how to use the custom expectations as same as how core expectations are used without initializing ge data…

Gitesh Shinde
- 68
- 5
0
votes
0 answers
Getting ValidationMetricIdentifier tuple must have at least six components when setting up code based great expectations data context
I am trying to set up a code based data context with a hosted static site having all stores be in azure. Here is the config and test code:
connection_str =
data_context_config = DataContextConfig(
config_version=2,
…

Pablo Beltran
- 41
- 5
0
votes
1 answer
Programmatic configuration for Great Expectations
I'm looking into integrating a validation framework to an existing PySpark project. There are a lot of examples how to configure Great Expectations using JSON/YAML files in official documentation. However, in my case table schemas are defined as…

ollik1
- 4,460
- 1
- 9
- 20
0
votes
1 answer
Provide aws credentials to Airflow GreatExpectationsOperator
I would like to use GreatExpectationsOperator to perform data quality validations.
The validation results data should be stored in S3.
I don't see an option to send an airflow connection name to the GE operator, and the AWS credentials in my…

Itai Sevitt
- 140
- 1
- 7
0
votes
1 answer
Specifying evaluation parameters for ExpectationConfiguration object in Great Expectations
I am trying to find out how to specify an evaluation parameter when I create an ExpectationConfiguration object.
To Reproduce
Steps to reproduce the behavior:
I have followed instructions as to how to create expectations using…

femibyte
- 3,317
- 7
- 34
- 59
0
votes
1 answer
Adding jars to the great_expectations' spark session
Setup:
My data is on Azure ADLS Gen2
I want to use the great_expectations package to test my data quality.
I am using the InferredAssetAzureDataConnector data_connector to create my data source (this works, I can see my files on the ADLS during…

Cribber
- 2,513
- 2
- 21
- 60
0
votes
1 answer
great_expectations create datasource of csv files on ADLS Gen2
I want to run great_expectation test suites against csv files in my ADLS Gen2. On my ADLS, I have a container called "data" in which I have a file at mypath/test/mydata.csv. I use a InferredAssetAzureDataConnector. I was able to create and…

Cribber
- 2,513
- 2
- 21
- 60
0
votes
2 answers
Great Expectations list total unique values
I have run Great Expectation check expect_column_values_to_be_unique check on one of the column. It produced the following result as below.Total There are 62 Duplicates but in the output list it is returning only 20 elements. How to retrieve all…

code_bug
- 355
- 1
- 12
0
votes
1 answer
Great expectations v3 API in aws glue 3.0
I'm trying to a validation in the pipeline using Great expectations on AWS glue 3.0.
Here's my initial attempt to create the data context at runtime based on their docs
def create_context():
logger.info("Create DataContext Config.")
…

darkCoffy
- 103
- 9
0
votes
1 answer
Spark Compatible Data Quality Framework for Narrow Data
I'm trying to find an appropriate data quality framework for very large amounts of time series data in a narrow format.
Image billions of rows of data that look kinda like…

Valentin
- 641
- 8
- 12
0
votes
1 answer
Great expectation error when using save_expectations_config() function
I tried to run this code using PyCharm:
import great_expectations as…

Tasbeeh
- 45
- 4
0
votes
1 answer
How to run great expectations on AWS lambda
I am trying to use great-expectations, i.e., run expectations suites within an AWS Lambda function.
When I am trying to install the packages in the requirements.txt, I get an error re jupyter…

MariaMadalina
- 479
- 6
- 20
0
votes
2 answers
CSV file can't be read using great expectation
when I run this code on pycharm using python:
import great_expectations as ge
df=ge.read_csv("C:\Users\TasbeehJ\data\yellow_tripdata_2019-02.csv")
it gave me this error:
SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in…

Tasbeeh
- 45
- 4