Questions tagged [amazon-deequ]
57 questions
1
vote
0 answers
Is it possible to load constrains from file (csv, txt) to Deequ Checks?
Is it possible to save suggested constrains to file and then load them as cheks? I was able to do it without saving them with next code
val allConstraints = suggestionResult.constraintSuggestions.flatMap {
case (_, suggestions) =>
…

Борис Маринов
- 11
- 2
1
vote
1 answer
Deequ satisfies function not behaving as expected
I am using pydeequ to run some checks on data, however it is not behaving as expected. One of my columns should contain any values between 0 and 1. The data looks like this
|col 1 |
| 0.5635412 |
| 0.123 |
| 1.0 |
check =…

lr53
- 67
- 8
1
vote
1 answer
Amazon Deequ (Spark + Scala ) - java.lang.NoSuchMethodError: 'scala.Option org.apache.spark.sql.catalyst.expressions.aggregate.AggregateFunction.toAgg
Spark Version - 3.0.1
Amazon Deequ version - deequ-2.0.0-spark-3.1.jar
Im running the below code in spark shell in my local :
import com.amazon.deequ.analyzers.runners.{AnalysisRunner, AnalyzerContext}
import…

Shiva Krishna
- 103
- 1
- 2
- 5
1
vote
1 answer
How to use hasUniqueness check in PyDeequ?
I'm using PyDeequ for data quality and I want to check the uniqueness of a set of columns. There is a Check method hasUniqueness but I can't figure how to use it.
I'm trying:
check.hasUniqueness([col1, col2], ????)
But what should we use here for…

ruy
- 23
- 3
1
vote
1 answer
What do the result dataframe's columns of a Deequ check signify?
So, I ran a simple Deequ check in Spark, that went something like this :
val verificationResult: VerificationResult = { VerificationSuite()
.onData(dataset)
.addCheck(
Check(CheckLevel.Error, "Review Check")
.isComplete("col1")
…

Debapratim Chakraborty
- 375
- 3
- 15
1
vote
1 answer
Using Deequ on AWS Glue
I am using Deequ on AWS GLUE, surprisingly when I was to run the hasMaxLength which is listed under Checks for the verificationSuite. I get the following error, can someone help? All other checks are passing/running. It says the check hasMaxLength…

user3476582
- 75
- 1
- 10
1
vote
1 answer
Pyspark version of Amazon Deequ
I am working on AWS Glue and leveraging pyspark API for my ETL.
I believe if I need to use Amazon Deequ I need to switch to Scala. However I still want contine to use Pyspark APIs. Is there a way out?
If yes what are the steps I need to follow in…

Ankur Shrivastava
- 223
- 4
- 14
1
vote
1 answer
Histogram in Anomaly detection Deequ library
Can we use histogram analyzer in anomaly detection?
Let's say, I want to check for the change in the ratio of variables in a specified column. For example
histogram analysis for a column with Male and Female as values is something like (Male - 0.6)…

Sarvesh Vishwakarma
- 11
- 1
1
vote
1 answer
Adding new suggestion rule in deequ
I would like to add several new rules in suggestions deequ workflow. For example deequ is offering check if column contains URL (containsURL). I would like to make corresponding suggestion rule.
I would appreciate suggestions how to do…

dejan
- 196
- 2
- 11
1
vote
1 answer
Requesting an advice on big data validation
I'm a newbie on big data validation and processing. Having little understanding about datacompy, which I have used to compare two datasets (pandas). However I couldn't find any source that can do data validations, i.e. column validations on emails,…

user157023
- 11
- 2
1
vote
1 answer
building a function to add checks to amazon deequ framework
Using amazon deequ library I'm trying to build a function that takes 3 parameters, the check object, a string telling what constraint needs to be run and another string that provides the constraint criteria. I have a bunch of checks that I want to…

Riyan Mohammed
- 247
- 2
- 6
- 20
1
vote
2 answers
Compute Metrics by using Deequ with Scala
I am new to Scala and Amazon Deequ. I have been asked to write a Scala code that would compute metrics (e.g. Completeness, CountDistinct etc) on constraints by using Deequ on source csv files stored on S3, and load the generated metrics in a Glue…

marie20
- 723
- 11
- 30
0
votes
0 answers
Error using PyDeequ Profile in Databricks
I am new to Python, Databricks, and pydeequ. I'm trying to use pydeequ in Databricks. I installed the library via Maven using "com.amazon.deequ:deequ:2.0.4-spark-3.3". The analyzers are working, but not the profilerunner.
I am trying to run this…

Azul Selser
- 1
- 1
0
votes
0 answers
Amazon deequ does not run in container but works locally
I am unable to execute deequ functionalities when I try to run the job on k8s. However, it works correctly in local. I am using 2.0.0-spark-3.1 as dependency. As a trivial test, I tried to run the following
val df =…

Rupam Bhattacharjee
- 359
- 1
- 10
0
votes
0 answers
Unable to pass variable to Deequ Checks
I am trying to implement Deequ Check: date_start distinct values should match number of days between 2018-01-01 and $runDate
Here is what I do:
Calculate date diff
val min_dt = LocalDate.of(2018, 1, 1)
// Adjusting max_dt to account for the Airflow…

kp_ihm
- 1