Great Expectations is an open source software that helps teams promote analytic integrity by offering a unique approach to data pipeline testing. Pipeline tests are applied to data (instead of code) and at batch time (instead of compile or deploy time). Pipeline tests are like unit tests for datasets: they help you guard against upstream data changes and monitor data quality. In addition to pipeline testing GE also provides data documentation/profiling
Questions tagged [great-expectations]
131 questions
1
vote
1 answer
Using great expectations with streamed data
I am using great expectations to test streaming data (I collect a sample into a batch and test the batch). The issue is I cannot use the docs because this will results in 100 of 1000s of html pages being generated. What I would like to do is use my…

Andy MGF
- 133
- 1
- 7
1
vote
0 answers
pyarrow.lib.ArrowNotImplementedError: Reading lists of structs from Parquet files not yet supported: paygw_etp_typs: list
I am using Great Expectations to test data within ETL pipelines. The data file I have, it is in Parquet format and does have some Arrays, when I am trying to create to new suite or try to convert into readable format using pyarrow/fastparquet I am…

Azaz Ahmad
- 33
- 8
1
vote
1 answer
How to run Great Expectations expectations on multiple columns?
I want to use the Great Expectations testing suite to run the same validations on many columns. I see that there's a closed feature request to have this as a built-in expectation, but can this be done with a for-loop over the column names?
In…

crypdick
- 16,152
- 7
- 51
- 74
0
votes
1 answer
Passing Airflow DAG Configuration Values to GreatExpectationsOperator Tasks
I have Airflow DAGs set up to run Great Expectations' checkpoints.yml alongside corresponding expectations.json files. These DAGs work well for full Data Quality tests.
Now, I'm in need of a DAG that can be triggered with configurations, such as…

Bazilio
- 5
- 2
0
votes
0 answers
Running Great Expectations in AWS Athena
Hello I need some help on running GX on AWS Athena.
Here is my config
conn_str = f"awsathena+rest://:@athena.{region_name}.amazonaws.com/{schema_name}?s3_staging_dir={s3_staging_dir}"
data = context.sources.add_sql(
name="props",…

Muhammad Raihan Muhaimin
- 5,559
- 7
- 47
- 68
0
votes
0 answers
Great Expectations bad performance on PySpark DataFrame
We want to integrate data quality checks in our ETL pipelines, and tried this with Great Expectations. All our ETL is in PySpark.
For small datasets, this is all well, but for larger ones the performance of Great Expectations is really bad. On a…

gamezone25
- 288
- 2
- 10
0
votes
0 answers
great expectation integration with slack
I am trying to integrate great expectations with slack using slack web hooks, i have followed the documentation of great expectation by adding validation_notification_slack_webhook variable in uncommitted/config_variables.yml and adding…
0
votes
0 answers
Great Expectations - expect_column_pair_values_to_not_be_equal
Is there a way to achieve the expectation: expect_column_pair_values_to_not_be_equal without having to create a custom expectation. If not what is the easiest way to create a custom expectation doing the desired outcome which is basically a…

PalBence
- 1
- 1
0
votes
1 answer
How can I add more than one S3 folder in GreatExpectations as my Data Asset?
I am attempting to create a Data Asset using the GreatExpectations library to point to all the files in subfolders under a parent folder. Here is a sample code snippet:
asset_name = "iceberg_asset"
s3_prefix = "folder_a/folder_b/folder_c/"…

Wild Tarzan
- 3
- 1
0
votes
0 answers
Unable to install Great Expectations on a Databricks notebook running Python
Summary
I'm unable to install GX on a Databricks Notebook (using Python).
As I'm stuck at the very first step of the guide, I'm unable to proceed with GX on DX. Any kind of help would be appreciated!
Environment
I'm trying to install GX in a…

Dror
- 12,174
- 21
- 90
- 160
0
votes
1 answer
Great Expectations using schema name in query for Redshift
I'm having an issue where when great expectations builds a query string to a table_asset it doesn't use the schema name.
import great_expectations as gx
from sqlalchemy_extras.sqlalchemy_utils import get_credentials, get_connection_string
# this is…

Bill
- 698
- 1
- 5
- 22
0
votes
1 answer
Conditional Expectations contains/like functionality and error (great expectations)
I am trying to add a conditional expectation that checks if the column "Value" is not equal to zero but only for a subset of the dataset where the column "Condition" contains the string "A".
I have two problems
I don't know how to implement the…

yuki
- 3
- 2
0
votes
0 answers
ImportError from Great Expectations
I've been using Great Expecations for a while and recently when I rebuilt a docker image I started to get this error. The image builds fine, but when I try to run code and import the package this error appears.
import great_expectations as ge
…

Jed
- 1,823
- 4
- 20
- 52
0
votes
1 answer
Error when using Great Expectations to read CSV from Azure Data Lake: TypeError: read_csv() got an unexpected keyword argument 'connect_options'
I'm using Great Expectations locally and trying to connect it to Azure Data Lake.
I'm testing the connection by simply reading a CSV-file from the data lake using the Pandas.
The code produces an error: TypeError: read_csv() got an unexpected…

Toivo Mattila
- 377
- 1
- 9
0
votes
1 answer
How to configure a Great Expectations [Validation Result Store] to Snowflake
Is it possible for Great Expectations to configure Validation Result Store to Snowflake database? Found only variant for PostgreSQL in documentation. I mean here possibility to directly put validation result to behind the scenes created a new table…

unkind58
- 117
- 2
- 11