Questions tagged [amazon-deequ]
57 questions
0
votes
0 answers
Get Error Records from deequ VerificationSuite
When we run any deequ VerificationSuite, can we see the input data exception records with respect to each rule when there is any error on rule. For ex: if rule1 failed for 10 records out of total 100 records, I see only summary which says this…

PythonDeveloper
- 289
- 1
- 4
- 24
0
votes
0 answers
How do I import Pydeequ on Glue jupyter notebooks?
I have been trying to import Pydeequ to develop tests on AWS Glue's notebook environment. I have downloaded pydeequ.zip file appropriately, and the jar file (deequ-2.0.0-spark-3.1.jar). Both of them are in an s3 bucket. I am using Glue 3.0 which…

Jonathan
- 46
- 3
0
votes
0 answers
Table's schema changes could affect Deequ on AWS SDLF?
I'm using Deequ in a solution based over Serverless DataLake Framework workshop, the issue is that the deequ process used to work successfully but I made some changes to the table's schema, so the glub job "sdlf-data-quality-controller" is throwing…

Artemination
- 703
- 2
- 10
- 30
0
votes
1 answer
How to set dynamic assert conditions for deequ verification checks in scala
I am using deequ verificationsuite to validate my sql tables but I am unable to implement dynamic assert conditions for checks :
val verificationResult: VerificationResult = { VerificationSuite()
.onData(dataset)
.addCheck(
…

vibhor Gupta
- 103
- 11
0
votes
1 answer
How to filter rows that violates constraints deequ
In order to do some unit test on my data I am using PyDeequ. Is there a way to filter out the rows which violate the defined constraints? I was not able to find anything online. Here is my code:
df1 = (spark
.read
.format("csv")
…

leop
- 41
- 7
0
votes
1 answer
deequ - How one can "train" deequ for a number trend?
let's say we have a column with a number that increases a bit on a daily basis, but cannot predict the increase with good precision.
For example (the value on day_x is):
day_1 = 10,
day_2 = 20,
day_3 = 35,
day_4 = 22, (a sudden decrease here)…
0
votes
1 answer
Spark Compatible Data Quality Framework for Narrow Data
I'm trying to find an appropriate data quality framework for very large amounts of time series data in a narrow format.
Image billions of rows of data that look kinda like…

Valentin
- 641
- 8
- 12
0
votes
1 answer
How to pass Cardinality Threshold value for Histogram in Deequ package?
By default the variable DEFAULT_CARDINALITY_THRESHOLD is set to 120 in Deequ. This is very low for our use case.
Can anyone please suggest if we can set this value to a higher number?

Sambit Jasu
- 1
- 2
0
votes
1 answer
Inferred type arguments [_$1] do not conform to method type parameter bounds
I have a case class :
case class AnomalyCheckConfigBuilder[S <: State[S]](anomalyDetectionStrategy: AnomalyDetectionStrategy,
analyzer: Analyzer[S, Metric[Double]],
…

Shiv
- 105
- 7
0
votes
0 answers
Unit Testing Apache Spark Application with Intellij Results in Error
I have a Spark application that is supposed to do data preparation step. I have some unit tests written for checking data quality using deequ and as usual I wanted to run one of my unit tests, but I'm running into errors as below:
Error while…

joesan
- 13,963
- 27
- 95
- 232
0
votes
1 answer
what is the compatible All dependencies to use Amazon Deequ
I have written code for amazon Deequ which is failing due to version issue. In my system Spark 2.4.0 is available, can anyone please suggest that which version of Deequ and Scala, fasterxml etc are compatible to use? I am getting INFO like multiple…

Anu Shivangi
- 45
- 5
0
votes
1 answer
Not able to create object of desired type in java
I'm using deequ to write analyzer. My editor is showing me this warning and I'm not sure how to fix this warning.
On line this:
Analyzer analyzer = new PatternMatch("email", Patterns.EMAIL(), option);
I get this warning in IntelliJ.
Raw use of…

Ruchit
- 336
- 3
- 16
0
votes
1 answer
How to submit a PyDeequ job from Jupyter Notebook to a Spark/YARN
How to configure the environment to submit a PyDeequ job to a Spark/YARN (client mode) from a Jupyter notebook. There is no comprehensive explanation other than those using the environment. How to setup the environment to use with non-AWS…

mon
- 18,789
- 22
- 112
- 205
0
votes
1 answer
ConstraintSuggestionRunner not taking up columns enclosed with backticks
I am currently importing the dataset from an excel sheet which has a column name with a dot character like this "abc.xyz".
I went through a couple of stackOverflow questions and it says that we can replace it with the column names with backtick like…

mshikher
- 174
- 3
- 20
0
votes
1 answer
How to check if values of 'column1' are within +-20% range of values of 'column2' using Amazon Deequ?
So, I'm using Amazon Deequ in spark, and I have a dataframe 'df' with two columns being of type 'Long' or numeric. I simply want to check:
value(column1) lies between value(column2)-20% and value(column2)+20% for all rows
I'm not sure what check to…

Debapratim Chakraborty
- 375
- 3
- 15