Questions tagged [great-expectations]

Great Expectations is an open source software that helps teams promote analytic integrity by offering a unique approach to data pipeline testing. Pipeline tests are applied to data (instead of code) and at batch time (instead of compile or deploy time). Pipeline tests are like unit tests for datasets: they help you guard against upstream data changes and monitor data quality. In addition to pipeline testing GE also provides data documentation/profiling

131 questions
2
votes
2 answers

Passing AWS role to the application that uses default boto3 configs

I have an aws setup that requires me to assume role and get corresponding credentials in order to write to s3. For example, to write with aws cli, I need to use --profile readwrite flag. If I write code myself with boot, I'd assume role via sts, get…
Philipp_Kats
  • 3,872
  • 3
  • 27
  • 44
1
vote
1 answer

Using great expectations with databricks autolaoder

I have implemented a data pipeline using autoloader bronze --> silver --> gold. now while I do this I want to perform some data quality checks, and for that I'm using great expectations library. However I'm stuck with below error when trying to…
1
vote
1 answer

Great expectations: UserConfigurableProfiler raises a MetricResolutionError: unhashable type: 'dict'

I am trying to use a profiler to create expectations on certain data batches. import great_expectations as gx from great_expectations.core.batch import BatchRequest from great_expectations.profile.user_configurable_profiler import…
Imad
  • 2,358
  • 5
  • 26
  • 55
1
vote
1 answer

Getting error while installing the great exepectation tool in local

ERROR: Could not install packages due to an OSError: [Errno 2] No such file or directory:…
1
vote
0 answers

how can I specify a different database and schema to create temporary tables in Great Expectations?

Great Expectations creates temporary tables. I tried profiling data in my Snowflake lab. It worked because the role I was using could create tables in the schema that contained the tables I was profiling. I tried to profile a table in a Snowflake…
Alex Woolford
  • 4,433
  • 11
  • 47
  • 80
1
vote
1 answer

python great expectation compatible with pyspark

I am implementing data quality checks using Great expectation library. does this library compatible with Pyspark does this run on multiple cores?
code_bug
  • 355
  • 1
  • 12
1
vote
0 answers

Great-Expectations: How to connect to data stored in S3

s3fs==2022.8.2 great-expectations==0.15.26 It was not easy to find a clear documentation and concrete examples for Great-Expectations. After several tries I succeeded to connect to the s3 bucket; import great_expectations as ge from…
Adil Blanco
  • 616
  • 2
  • 6
  • 23
1
vote
1 answer

How to open index.html file in databricks or browser?

I am trying to open index.html file through databricks. Can someone please let me know how to deal with it? I am trying to use GX with databricks and currently, data bricks store this file here:…
SeleniumUser
  • 4,065
  • 2
  • 7
  • 30
1
vote
1 answer

Creating an Expectation Suite With an Automated Profiler Great Expectation

I am a newbie to great expectations and trying to set up but facing the below issue while creating an expectation Suite with an Automated Profiler. C:\Users\user\great_expectations>great_expectations --v3-api suite new Using v3 (Batch Request)…
SeleniumUser
  • 4,065
  • 2
  • 7
  • 30
1
vote
1 answer

Great Expectation with Azure and Databricks

I want to run great_expectation test suites against csv files in my ADLS Gen2. On my ADLS, I have a container called "input" in which I have a file at input/GE/ind.csv. I use a InferredAssetAzureDataConnector. I was able to create and test/validate…
1
vote
0 answers

Display whole rows in great_expectations dashboard

When an expectation fails, I cannot view on the dashboard (the data docs) the entire row (and not just the column value) which caused the failure. For example, if I have a failure because the maximum value of a numerical column is over a threshold,…
aprospero
  • 529
  • 3
  • 14
1
vote
1 answer

Great Expectations Row Based Dimensions

I have data like this: [ { "name": "Apple", "price": 1, "type": "Food" }, { "name": "Apple", "price": 0.90, "type": "Food" }, { "name": "Apple", "price": 1000, …
steve76
  • 302
  • 2
  • 9
1
vote
0 answers

Great Expectations: How to add a partition (column partition) in an Athena External Table in a checkpoint reference in GE?

The setup is GE v3 and I am using AWS Athena as a Data Source. However, I couldn't find a way to tell the "expectation" that the table in actually partitioned with a relative path in S3 like…
nandevers
  • 191
  • 8
1
vote
1 answer

Using great expectations for date validation

We are using great_expectations to validate data using Apache Spark. We are unable to validate columns which have the DATE or DATETIME type. We use the below configuration to check date entries in a table if they are recent or not. [ …
Akhil Nambiar
  • 315
  • 3
  • 18
1
vote
1 answer

How to integrate great expectations into airflow project

I m trying to integrate great expectations into a airflow project but without success. My question is there a configuration to do ? Here are the steps I followed: 1- I generate the great expectaions project by following this tutorial…
Adil Blanco
  • 616
  • 2
  • 6
  • 23
1 2
3
8 9