Questions tagged [amazon-deequ]
57 questions
0
votes
1 answer
Is it possible to run Deequ anomaly detection on multiple partitions separately in parallel
We have Spark dataframes partitioned on multiple columns. For example, we have a partner column that can be Google, Facebook, and Bing. And we have a channel column that can be PLA and Text. We would like to run anomaly detection on Google-PLA,…

Sifang
- 11
- 3
0
votes
2 answers
Unable to run amazon deequ examples locally
I am trying to run and test amazon deequ library locally but am repeatedly getting the class not found error for various examples. exact error
java.lang.NoClassDefFoundError: scala/Product$class
at…

ReyAhamed
- 43
- 10
0
votes
1 answer
Using reflections to access methods in Amazon Deequ
I plan on creating a user config file that I will later parse in order to run some checks from Amazon Deequ. I want to be able to pass the string names from the config file to get the methods; however, in my attempts to do so, I keep hitting…

dustin
- 4,309
- 12
- 57
- 79
0
votes
1 answer
Failed to load : com/amazon/deequ/checks/Check
I'm building a spark application to load two json files, compare them, and print the differences. I also try to validate these files using amazon library aws deequ , but I'm getting the below exception:
WARNING: Use --illegal-access=warn to enable…

Arar
- 1,926
- 5
- 29
- 47
0
votes
2 answers
Azure DataBricks - Deequ - Finding Rows that failed on a check
I followed https://aws.amazon.com/blogs/big-data/test-data-quality-at-scale-with-deequ/
and got running with the checks and verification etc.
But I am not able to find out , on which rows exactly my data is failing.
That is a very important part ,…

Ayush Aryan
- 23
- 5
0
votes
2 answers
Adding a Check based on a Compliance analyzer
Here is the sample data frame (df) I'm working with:
+---+----+--------+
| id|orig|scrubbed|
+---+----+--------+
| 1| a| a|
| 2| B| b|
| 3| c| c|
| 4| D| d|
| 5| *| XX|
| 6| $| XX|
| 7| ZZ| …

Michael Burkhardt
- 21
- 3
0
votes
1 answer
How can i save Deequ Contraint Suggestions to a file for use again?
Hi I am using Amazon Deequ to generate a set of constraints for data quality checks on my data.
I want to save the constraint suggestion object to HDS so I can load it and use it to verify any time I want to run a data quality check.
How can I save…

martinl
- 1
- 2
0
votes
0 answers
Print the metrics directly without mentioning the column names explicitly Scala
I have the following code in scala:
import com.amazon.deequ.analyzers.runners.{AnalysisRunner, AnalyzerContext}
import com.amazon.deequ.analyzers.runners.AnalyzerContext.successMetricsAsDataFrame
import com.amazon.deequ.analyzers.{Compliance,…

user10273140
- 21
- 3
0
votes
1 answer
How to call Amazon Deequ hasDataType from java
I am trying to implement Amazon Deequ functionality from Java.
I am trying to add datatype constains but not able to pass 3rd parameter (assertion) from java
com.amazon.deequ.constraints.Constraint constrains = Constraint.dataTypeConstraint("test",…
0
votes
1 answer
Parse DQ rules from excel in AWS Deequ
Does anyone have an example on how to parse Data Quality rules from an excel sheet in AWS DeeQu?

Mayank Srivastava
- 77
- 2
- 11
0
votes
1 answer
How to filter rows with column constraint in Deequ ColumnProfileRunner?
I am new to Scala and Spark. I am exploring the Amazon Deequ library for data profiling.
How do I get count of rows having a particular value while using ColumnProfilerRunner()?
The AnalysisRunner has an option of "compliance" I am looking for a…

Ravi
- 117
- 1
- 2
- 10
0
votes
1 answer
Scala Spark: how to add list of generated methods to a function
I am using Amazon deequ to generate test cases which returns following list of methods that I want to use in further function instead of coding it individually.
var rows = suggestionDataFrame.select("_3").collect().map(_.getString(0)).mkString("…

Sandeep Singh
- 7,790
- 4
- 43
- 68