Great Expectations is an open source software that helps teams promote analytic integrity by offering a unique approach to data pipeline testing. Pipeline tests are applied to data (instead of code) and at batch time (instead of compile or deploy time). Pipeline tests are like unit tests for datasets: they help you guard against upstream data changes and monitor data quality. In addition to pipeline testing GE also provides data documentation/profiling
Questions tagged [great-expectations]
131 questions
1
vote
0 answers
Validating datasets produced by identical Apache airflows
I have the same workflow on two different environments. To validate that both workflows are identical, I feed the same input data to both workflows. If they are identical, I am expecting the output dataset of each workflow to be same.
In this…

user929287171
- 91
- 5
1
vote
1 answer
Use Great Expectations to validate pandas DataFrame with existing suite JSON
I'm using the Great Expectations python package (version 0.14.10) to validate some data. I've already followed the provided tutorials and created a great_expectations.yml in the local ./great_expectations folder. I've also created a great…

Jed
- 1,823
- 4
- 20
- 52
1
vote
1 answer
How do I pass multiple CSVs with custom delimiter to a great_expetctation checkpoint
I am trying to run great_expectation checkpoint on 10 CSV files with "|" delimiter.
Currently, I have to specify this all in a YAML file and that after converting my files from "|" delimiter to ",".
How can run this for multiple files without…

Ravi
- 35
- 9
1
vote
0 answers
Can Apache Great Expectation segregate good and bad records?
I am using Great Expectations in my ETL data pipeline for a POC. I have a validation which is failing (as expected), and I have the following data in my validation JSON:
"unexpected_count": 205,
"unexpected_percent": 10.25,
…

Kuwali
- 233
- 3
- 13
1
vote
0 answers
great_expectations add checkpoint with batch_spec_passthrough
In great_expectations, I am trying to add a checkpoint to a context. The batch of data refers to a csv file stored on s3 having a semicolumn as separator. I am loading the batch using PySpark as connector. I tried with the following code:
First I…

aprospero
- 529
- 3
- 14
1
vote
1 answer
unable to initialize snowflake data source
I am trying to access the snowflake datasource using "great_expectations" library.
The following is what I tried so far:
from ruamel import yaml
import great_expectations as ge
from great_expectations.core.batch import BatchRequest,…

cloud_hari
- 147
- 1
- 8
1
vote
0 answers
Great expectation validation results operations
Is there a way to split the data a batch in two streams of data:
one for which the expectations are met
The second one for which expectations fail
That is to split the tested batch of data into two table/pandas data frames? one that is clean and…

MariaMadalina
- 479
- 6
- 20
1
vote
0 answers
great_expectations data validation on Cassandra
I have multiple tables in a Cassandra keyspace. I want to use Great Expectations to validate my data. I've been trying to use Spark to load data from Cassandra and I was able to create RuntimeBatchRequest using Spark dataframes. However I need to…

alit8
- 41
- 1
- 3
1
vote
1 answer
Airflow - Great Expectations - Getting/Setting config variables
I currently am trying to use the Python Data validation package 'Great Expectations'.
I am currently using the GreatExpectationsOperator to call an expectation suite on a particular datasource (a PostgreSQL datasource).
my_ge_task =…

adan11
- 647
- 1
- 7
- 24
1
vote
2 answers
How to create a Python Wheel Or Determine what modules / libraries are within a Python Wheel
I am trying to create a Python Wheel for Great_Expectations. The .whl provided by Great_Expectations exists here https://pypi.org/project/great-expectations/#files - great-expectations 0.13.25. Unfortunately, it appears that this .whl doesn't…

Patterson
- 1,927
- 1
- 19
- 56
1
vote
0 answers
great_expectations and scrapy
When I am using a project with great_expectations and scrapy there seem to be errors that somehow conflict.
When I uninstall either of these libraries everything works fine, but using both there are some errors.
Here is my stack trace, but I can not…

Ben Muller
- 221
- 1
- 4
- 10
1
vote
1 answer
How to pass a CustomDataAsset to a DataContext to run custom expectations on a batch?
I have a CustomPandasDataset with a custom expectation
from great_expectations.data_asset import DataAsset
from great_expectations.dataset import PandasDataset
from datetime import date, datetime, timedelta
class…

Miguel Trejo
- 5,913
- 5
- 24
- 49
1
vote
1 answer
How to import Great Expectations custom datasource ValueError: no package specified for (required for relative module names)
I have this folder structure for my Great Expectations project:
great_expectations/
dataset/
__init__.py
oracle_dataset.py
datasource/
__init__.py
oracle_datasource.py
…

Pierre Delecto
- 455
- 1
- 7
- 26
1
vote
1 answer
How to acces output folder from a PythonScriptStep?
I'm new to azure-ml, and have been tasked to make some integration tests for a couple of pipeline steps. I have prepared some input test data and some expected output data, which I store on a 'test_datastore'. The following example code is a…

Average_guy
- 509
- 4
- 16
1
vote
1 answer
Unable to set up data source as aws s3 via cli and test_yaml_config in great_expections
great_expectations setup:
Created a new virtual environment
Installed required packages:
pip install boto3
pip install fsspec
pip install s3fs
Updated data source in configuration: great_expectations.yml
datasources:
pandas_s3:
class_name:…

Mohanraj N
- 11
- 1