Great Expectations is an open source software that helps teams promote analytic integrity by offering a unique approach to data pipeline testing. Pipeline tests are applied to data (instead of code) and at batch time (instead of compile or deploy time). Pipeline tests are like unit tests for datasets: they help you guard against upstream data changes and monitor data quality. In addition to pipeline testing GE also provides data documentation/profiling
Questions tagged [great-expectations]
131 questions
3
votes
4 answers
Great Expectations - Run Validation over specific subset of a PostgreSQL table
I am fairly new to Great Expectations - and have a question. Essentially I have a PostgreSQL database, and every time I run my data pipeline, i want to validate a specific subset of the PostgreSQL table based off some key. Eg: If the data pipeline…

adan11
- 647
- 1
- 7
- 24
3
votes
2 answers
How to Save Great Expectations results to File From Apache Spark - With Data Docs
I have successfully created a Great_Expectation result and I would like to output the results of the expectation to an html file.
There are few links highlighting how show the results in human readable from using what is called 'Data Docs'…

Patterson
- 1,927
- 1
- 19
- 56
3
votes
1 answer
Use a pickled pandas dataframe as a data asset in great_expectations
Probably a very simple question but I could not figure it out from the documentation of great_expectations. I would like to run some tests on a pandas dataframe that is stored locally as a pickled file '.pkl'.
When I ran great_expectations…

Manu
- 58
- 6
2
votes
0 answers
Great_Expectations - Constructor public org.apache.spark.SparkConf(boolean) is not whitelisted
I am using Great_Expectations in databricks.
I am using shared cluser and runtime version is
13.1 Beta (includes Apache Spark 3.4.0, Scala 2.12)
py4j version 0.10.9.7
pyspark version 3.4.0
here is my code:
%pip install…

Milind Keer
- 21
- 2
2
votes
2 answers
Cannot run Great Expectations quickstart
I am trying to use Great Expectations (Python data quality framework). I ran the quickstart after installing GX on WSL2 and Python 3.9.16
The quickstart code can be found here: https://docs.greatexpectations.io/docs/tutorials/quickstart/
I am…

BuahahaXD
- 609
- 2
- 8
- 24
2
votes
2 answers
How to use Kedro with Great-expectations?
I am using Kedro to create a pipeline for ETL purposes and column specific validations are being done using Great-Expectations. There is a hooks.py file listed in Kedro documentation here. This hook is registered as per the instructions mentioned on…

Dhaval Thakkar
- 43
- 10
2
votes
0 answers
Great Expectations Validator and Checkpoint only seem to process a single file, not all in the Data Asset
I am using Great Expectations to create data quality tests on intermediate featuresets in a pyspark featureset-generation pipeline. The intermediate featuresets are therefore stored in thousands of .snappy.parquet files to support the distributed…

Stod
- 63
- 1
- 6
2
votes
1 answer
Great Expectation and Spark : get full list of include_unexpected_rows
I'm currently testing my datasets and so far so good, unfortunately i'm unable to get rows that don't match my expectations.
I'm using a SparkDFExecutionEngine Execution Engine
For exemple…

macdrai
- 586
- 5
- 6
2
votes
1 answer
Python great expectation with Spark : Get result for each row
I am using python great expectation to validate my data using Apache spark. Basically I would like to add a flag is_valid against each row of the data frame. To add this flag, I need to apply multiple checks on each column with in the row. Great…

sunitha
- 1,468
- 14
- 18
2
votes
0 answers
Great expectations framework - AWS Redshift connection
I'm trying to set up a connection to AWS Redshift from the Great Expectations Framework (GE) according to the tutorial using Python and facing two issues:
When I'm using postgresql+psycopg2 as driver in the connection string in step 5, adding the…

Sebastian Dengler
- 1,258
- 13
- 30
2
votes
1 answer
Test yaml great-expectations with Bigquery
I am having troubles testing the yaml of great-expectation to bigquery.
I followed the official documentation and got to this code
import os
import great_expectations as ge
datasource_yaml = """
name: my_bigquery_datasource
class_name:…

elvainch
- 1,369
- 3
- 15
- 32
2
votes
1 answer
How to Convert Great Expectations DataFrame to Apache Spark DataFrame
The following code will convert an Apache Spark DataFrame to a Great_Expectations DataFrame. For if I wanted to convert the Spark DataFrame, spkDF to a Great_Expectations DataFrame I would do the following:
ge_df = SparkDFDataset(spkDF)
Can someone…

Patterson
- 1,927
- 1
- 19
- 56
2
votes
1 answer
Using Python Great Expectations to remove invalid data
I just started with Great Expectations library and I want to know if it is possible to use it to remove invalidated data from Pandas DataFrame. And how I can do that if is possible ?
Also I want to insert invalid data to PostgreSQL database.
I…

Florin P.
- 45
- 7
2
votes
1 answer
How to Save an Great Expectation to Azure Data Lake or Blob Store
I'm trying save an great_expectations 'expectation_suite to Azue ADLS Gen 2 or Blob store with the following line of…

Patterson
- 1,927
- 1
- 19
- 56
2
votes
2 answers
How to get Great_Expectations to work with Spark Dataframes in Apache Spark ValueError: Unrecognized spark type: string
I have a Apache Spark dataframe which as a 'string' type field. However, Great_Expectations doesn't recognize the field type. I have imported the modules that I think are necessary, but not sure why Great_Expectations doesn't recognize the…

Patterson
- 1,927
- 1
- 19
- 56